Eric Lee / smarc-fsl-linux-kernel

29 Oct, 2020

2 commits

67d3ed576 Merge 'v5.10-rc1' into android-mainline ... Browse Code »

Linux 5.10-rc1

Signed-off-by: Greg Kroah-Hartman
Change-Id: Iace3fc84a00d3023c75caa086a266de17dc1847c

Greg Kroah-Hartman
2020-10-29 13:32:38 +0800
338f8e80b Merge 0746c4a9f3d3 ("Merge branch 'i2c/for-5.10' of git://git.kernel.org/pub/scm… ... Browse Code »

…/linux/kernel/git/wsa/linux") into android-mainline

Steps on the way to 5.10-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iec426c6de4a59a517e5fa575a9424b883d958f08

Greg Kroah-Hartman
2020-10-29 04:20:34 +0800

26 Oct, 2020

3 commits

05d2a661f Merge 54a4c789ca80 ("Merge tag 'docs/v5.10-1' of git://git.kernel.org/pub/scm/li… ... Browse Code »

…nux/kernel/git/mchehab/linux-media") into android-mainline

Steps on the way to 5.10-rc1

Resolves conflicts in:
fs/userfaultfd.c

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie3fe3c818f1f6565cfd4fa551de72d2b72ef60af

Greg Kroah-Hartman
2020-10-26 16:23:33 +0800
986b9eacb kernel/sys.c: fix prototype of prctl_get_tid_address() ... Browse Code »

tid_addr is not a "pointer to (pointer to int in userspace)"; it is in
fact a "pointer to (pointer to int in userspace) in userspace". So
sparse rightfully complains about passing a kernel pointer to
put_user().

Reported-by: kernel test robot
Signed-off-by: Rasmus Villemoes
Signed-off-by: Linus Torvalds

Rasmus Villemoes
2020-10-26 02:44:16 +0800
81ecf91ea Merge tag 'safesetid-5.10' of git://github.com/micah-morton/linux ... Browse Code »

Pull SafeSetID updates from Micah Morton:
"The changes are mostly contained to within the SafeSetID LSM, with the
exception of a few 1-line changes to change some ns_capable() calls to
ns_capable_setid() -- causing a flag (CAP_OPT_INSETID) to be set that
is examined by SafeSetID code and nothing else in the kernel.

The changes to SafeSetID internally allow for setting up GID
transition security policies, as already existed for UIDs"

* tag 'safesetid-5.10' of git://github.com/micah-morton/linux:
LSM: SafeSetID: Fix warnings reported by test bot
LSM: SafeSetID: Add GID security policy handling
LSM: Signal to SafeSetID when setting group IDs

Linus Torvalds
2020-10-26 01:45:26 +0800

17 Oct, 2020

1 commit

15ec0fcff kernel/sys.c: replace do_brk with do_brk_flags in comment of prctl_set_mm_map() ... Browse Code »

Replace do_brk with do_brk_flags in comment of prctl_set_mm_map(), since
do_brk was removed in following commit.

Fixes: bb177a732c4369 ("mm: do not bug_on on incorrect length in __mm_populate()")
Signed-off-by: Liao Pingfang
Signed-off-by: Yi Wang
Signed-off-by: Andrew Morton
Link: https://lkml.kernel.org/r/1600650751-43127-1-git-send-email-wang.yi59@zte.com.cn
Signed-off-by: Linus Torvalds

Liao Pingfang
2020-10-17 02:11:19 +0800

14 Oct, 2020

1 commit

111767c1d LSM: Signal to SafeSetID when setting group IDs ... Browse Code »

For SafeSetID to properly gate set*gid() calls, it needs to know whether
ns_capable() is being called from within a sys_set*gid() function or is
being called from elsewhere in the kernel. This allows SafeSetID to deny
CAP_SETGID to restricted groups when they are attempting to use the
capability for code paths other than updating GIDs (e.g. setting up
userns GID mappings). This is the identical approach to what is
currently done for CAP_SETUID.

NOTE: We also add signaling to SafeSetID from the setgroups() syscall,
as we have future plans to restrict a process' ability to set
supplementary groups in addition to what is added in this series for
restricting setting of the primary group.

Signed-off-by: Thomas Cedeno
Signed-off-by: Micah Morton

Thomas Cedeno
2020-10-14 00:17:34 +0800

01 Sep, 2020

1 commit

f022c0602 Merge 5.9-rc3 into android-mainline ... Browse Code »

Linux 5.9-rc3

Signed-off-by: Greg Kroah-Hartman
Change-Id: Ic7758bc57a7d91861657388ddd015db5c5db5480

Greg Kroah-Hartman
2020-09-01 01:51:25 +0800

24 Aug, 2020

1 commit

df561f668 treewide: Use fallthrough pseudo-keyword ... Browse Code »

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva

Gustavo A. R. Silva
2020-08-24 06:36:59 +0800

07 Aug, 2020

1 commit

f04a11ac9 Merge 7b4ea9456dd3 ("Revert "x86/mm/64: Do not sync vmalloc/ioremap mappings"") … ... Browse Code »

…into android-mainline

Steps on the way to 5.9-rc1

Resolves conflicts in:
drivers/irqchip/qcom-pdc.c
include/linux/device.h
net/xfrm/xfrm_state.c
security/lsm_audit.c

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4aeb3d04f4717714a421721eb3ce690c099bb30a

Greg Kroah-Hartman
2020-08-07 22:01:35 +0800

20 Jul, 2020

2 commits

227175b2c prctl: exe link permission error changed from -EINVAL to -EPERM ... Browse Code »

This brings consistency with the rest of the prctl() syscall where
-EPERM is returned when failing a capability check.

Signed-off-by: Nicolas Viennot
Signed-off-by: Adrian Reber
Reviewed-by: Serge Hallyn
Link: https://lore.kernel.org/r/20200719100418.2112740-7-areber@redhat.com
Signed-off-by: Christian Brauner

Nicolas Viennot
2020-07-20 02:14:42 +0800
ebd6de681 prctl: Allow local CAP_CHECKPOINT_RESTORE to change /proc/self/exe ... Browse Code »

Originally, only a local CAP_SYS_ADMIN could change the exe link,
making it difficult for doing checkpoint/restore without CAP_SYS_ADMIN.
This commit adds CAP_CHECKPOINT_RESTORE in addition to CAP_SYS_ADMIN
for permitting changing the exe link.

The following describes the history of the /proc/self/exe permission
checks as it may be difficult to understand what decisions lead to this
point.

* [1] May 2012: This commit introduces the ability of changing
/proc/self/exe if the user is CAP_SYS_RESOURCE capable.
In the related discussion [2], no clear thread model is presented for
what could happen if the /proc/self/exe changes multiple times, or why
would the admin be at the mercy of userspace.

* [3] Oct 2014: This commit introduces a new API to change
/proc/self/exe. The permission no longer checks for CAP_SYS_RESOURCE,
but instead checks if the current user is root (uid=0) in its local
namespace. In the related discussion [4] it is said that "Controlling
exe_fd without privileges may turn out to be dangerous. At least
things like tomoyo examine it for making policy decisions (see
tomoyo_manager())."

* [5] Dec 2016: This commit removes the restriction to change
/proc/self/exe at most once. The related discussion [6] informs that
the audit subsystem relies on the exe symlink, presumably
audit_log_d_path_exe() in kernel/audit.c.

* [7] May 2017: This commit changed the check from uid==0 to local
CAP_SYS_ADMIN. No discussion.

* [8] July 2020: A PoC to spoof any program's /proc/self/exe via ptrace
is demonstrated

Overall, the concrete points that were made to retain capability checks
around changing the exe symlink is that tomoyo_manager() and
audit_log_d_path_exe() uses the exe_file path.

Christian Brauner said that relying on /proc//exe being immutable (or
guarded by caps) in a sake of security is a bit misleading. It can only
be used as a hint without any guarantees of what code is being executed
once execve() returns to userspace. Christian suggested that in the
future, we could call audit_log() or similar to inform the admin of all
exe link changes, instead of attempting to provide security guarantees
via permission checks. However, this proposed change requires the
understanding of the security implications in the tomoyo/audit subsystems.

[1] b32dfe377102 ("c/r: prctl: add ability to set new mm_struct::exe_file")
[2] https://lore.kernel.org/patchwork/patch/292515/
[3] f606b77f1a9e ("prctl: PR_SET_MM -- introduce PR_SET_MM_MAP operation")
[4] https://lore.kernel.org/patchwork/patch/479359/
[5] 3fb4afd9a504 ("prctl: remove one-shot limitation for changing exe link")
[6] https://lore.kernel.org/patchwork/patch/697304/
[7] 4d28df6152aa ("prctl: Allow local CAP_SYS_ADMIN changing exe_file")
[8] https://github.com/nviennot/run_as_exe

Signed-off-by: Nicolas Viennot
Signed-off-by: Adrian Reber
Link: https://lore.kernel.org/r/20200719100418.2112740-6-areber@redhat.com
Signed-off-by: Christian Brauner

Nicolas Viennot
2020-07-20 02:14:42 +0800

25 Jun, 2020

2 commits

a33191d44 Merge 5.8-rc1 into android-mainline ... Browse Code »

Linux 5.8-rc1

Signed-off-by: Greg Kroah-Hartman
Change-Id: I00f2168bc9b6fd8e48c7c0776088d2c6cb8e1629

Greg Kroah-Hartman
2020-06-25 20:25:32 +0800
1da58ab55 ANDROID: fix up direct access to mmap_sem ... Browse Code »

It's now being abstracted away, so fix up the ANDROID specific code that
touches the lock to use the correct functions instead.

Signed-off-by: Greg Kroah-Hartman
Change-Id: I9cedb62118cd1bba7af27deb11499d312d24d7fc

Greg Kroah-Hartman
2020-06-25 00:17:12 +0800

24 Jun, 2020

1 commit

a253db891 Merge ad57a1022f9e ("Merge tag 'exfat-for-5.8-rc1' of git://git.kernel.org/pub/s… ... Browse Code »

…cm/linux/kernel/git/linkinjeon/exfat") into android-mainline

Steps on the way to 5.8-rc1.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4bc42f572167ea2f815688b4d1eb6124b6d260d4

Greg Kroah-Hartman
2020-06-24 23:54:12 +0800

15 Jun, 2020

2 commits

4a87b197c Merge tag 'LSM-add-setgid-hook-5.8-author-fix' of git://github.com/micah-morton/linux ... Browse Code »

Pull SafeSetID update from Micah Morton:
"Add additional LSM hooks for SafeSetID

SafeSetID is capable of making allow/deny decisions for set*uid calls
on a system, and we want to add similar functionality for set*gid
calls.

The work to do that is not yet complete, so probably won't make it in
for v5.8, but we are looking to get this simple patch in for v5.8
since we have it ready.

We are planning on the rest of the work for extending the SafeSetID
LSM being merged during the v5.9 merge window"

* tag 'LSM-add-setgid-hook-5.8-author-fix' of git://github.com/micah-morton/linux:
security: Add LSM hooks to set*gid syscalls

Linus Torvalds
2020-06-15 02:39:31 +0800
39030e135 security: Add LSM hooks to set*gid syscalls ... Browse Code »

The SafeSetID LSM uses the security_task_fix_setuid hook to filter
set*uid() syscalls according to its configured security policy. In
preparation for adding analagous support in the LSM for set*gid()
syscalls, we add the requisite hook here. Tested by putting print
statements in the security_task_fix_setgid hook and seeing them get hit
during kernel boot.

Signed-off-by: Thomas Cedeno
Signed-off-by: Micah Morton

Thomas Cedeno
2020-06-15 01:52:02 +0800

12 Jun, 2020

2 commits

a64320f5b Merge 94709049fb84 ("Merge branch 'akpm' (patches from Andrew)") into android-mainline ... Browse Code »

Tiny merge resolutions along the way to 5.8-rc1.

Change-Id: I24b3cca28ed36f32c92b6374dae5d7f006d3bced
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2020-06-12 20:39:46 +0800
e93a57129 Merge 062ea674ae8a ("Merge branch 'uaccess.__copy_to_user' of git://git.kernel.o… ... Browse Code »

…rg/pub/scm/linux/kernel/git/viro/vfs") into android-mainline

baby steps on the way to 5.8-rc1

Change-Id: I45a38d5752e33d1f9a27bce6055435811512ea87
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>

Greg Kroah-Hartman
2020-06-12 19:55:55 +0800

10 Jun, 2020

2 commits

c1e8d7c6a mmap locking API: convert mmap_sem comments ... Browse Code »

Convert comments that reference mmap_sem to reference mmap_lock instead.

[akpm@linux-foundation.org: fix up linux-next leftovers]
[akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
[akpm@linux-foundation.org: more linux-next fixups, per Michel]

Signed-off-by: Michel Lespinasse
Signed-off-by: Andrew Morton
Reviewed-by: Vlastimil Babka
Reviewed-by: Daniel Jordan
Cc: Davidlohr Bueso
Cc: David Rientjes
Cc: Hugh Dickins
Cc: Jason Gunthorpe
Cc: Jerome Glisse
Cc: John Hubbard
Cc: Laurent Dufour
Cc: Liam Howlett
Cc: Matthew Wilcox
Cc: Peter Zijlstra
Cc: Ying Han
Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
Signed-off-by: Linus Torvalds

Michel Lespinasse
2020-06-10 00:39:14 +0800
d8ed45c5d mmap locking API: use coccinelle to convert mmap_sem rwsem call sites ... Browse Code »

This change converts the existing mmap_sem rwsem calls to use the new mmap
locking API instead.

The change is generated using coccinelle with the following rule:

// spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

@@
expression mm;
@@
(
-init_rwsem
+mmap_init_lock
|
-down_write
+mmap_write_lock
|
-down_write_killable
+mmap_write_lock_killable
|
-down_write_trylock
+mmap_write_trylock
|
-up_write
+mmap_write_unlock
|
-downgrade_write
+mmap_write_downgrade
|
-down_read
+mmap_read_lock
|
-down_read_killable
+mmap_read_lock_killable
|
-down_read_trylock
+mmap_read_trylock
|
-up_read
+mmap_read_unlock
)
-(&mm->mmap_sem)
+(mm)

Signed-off-by: Michel Lespinasse
Signed-off-by: Andrew Morton
Reviewed-by: Daniel Jordan
Reviewed-by: Laurent Dufour
Reviewed-by: Vlastimil Babka
Cc: Davidlohr Bueso
Cc: David Rientjes
Cc: Hugh Dickins
Cc: Jason Gunthorpe
Cc: Jerome Glisse
Cc: John Hubbard
Cc: Liam Howlett
Cc: Matthew Wilcox
Cc: Peter Zijlstra
Cc: Ying Han
Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
Signed-off-by: Linus Torvalds

Michel Lespinasse
2020-06-10 00:39:14 +0800

03 Jun, 2020

2 commits

94709049f Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge updates from Andrew Morton:
"A few little subsystems and a start of a lot of MM patches.

Subsystems affected by this patch series: squashfs, ocfs2, parisc,
vfs. With mm subsystems: slab-generic, slub, debug, pagecache, gup,
swap, memcg, pagemap, memory-failure, vmalloc, kasan"

* emailed patches from Andrew Morton : (128 commits)
kasan: move kasan_report() into report.c
mm/mm_init.c: report kasan-tag information stored in page->flags
ubsan: entirely disable alignment checks under UBSAN_TRAP
kasan: fix clang compilation warning due to stack protector
x86/mm: remove vmalloc faulting
mm: remove vmalloc_sync_(un)mappings()
x86/mm/32: implement arch_sync_kernel_mappings()
x86/mm/64: implement arch_sync_kernel_mappings()
mm/ioremap: track which page-table levels were modified
mm/vmalloc: track which page-table levels were modified
mm: add functions to track page directory modifications
s390: use __vmalloc_node in stack_alloc
powerpc: use __vmalloc_node in alloc_vm_stack
arm64: use __vmalloc_node in arch_alloc_vmap_stack
mm: remove vmalloc_user_node_flags
mm: switch the test_vmalloc module to use __vmalloc_node
mm: remove __vmalloc_node_flags_caller
mm: remove both instances of __vmalloc_node_flags
mm: remove the prot argument to __vmalloc_node
mm: remove the pgprot argument to __vmalloc
...

Linus Torvalds
2020-06-03 03:21:36 +0800
a37b0715d mm/writeback: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE ... Browse Code »

PF_LESS_THROTTLE exists for loop-back nfsd (and a similar need in the
loop block driver and callers of prctl(PR_SET_IO_FLUSHER)), where a
daemon needs to write to one bdi (the final bdi) in order to free up
writes queued to another bdi (the client bdi).

The daemon sets PF_LESS_THROTTLE and gets a larger allowance of dirty
pages, so that it can still dirty pages after other processses have been
throttled. The purpose of this is to avoid deadlock that happen when
the PF_LESS_THROTTLE process must write for any dirty pages to be freed,
but it is being thottled and cannot write.

This approach was designed when all threads were blocked equally,
independently on which device they were writing to, or how fast it was.
Since that time the writeback algorithm has changed substantially with
different threads getting different allowances based on non-trivial
heuristics. This means the simple "add 25%" heuristic is no longer
reliable.

The important issue is not that the daemon needs a *larger* dirty page
allowance, but that it needs a *private* dirty page allowance, so that
dirty pages for the "client" bdi that it is helping to clear (the bdi
for an NFS filesystem or loop block device etc) do not affect the
throttling of the daemon writing to the "final" bdi.

This patch changes the heuristic so that the task is not throttled when
the bdi it is writing to has a dirty page count below below (or equal
to) the free-run threshold for that bdi. This ensures it will always be
able to have some pages in flight, and so will not deadlock.

In a steady-state, it is expected that PF_LOCAL_THROTTLE tasks might
still be throttled by global threshold, but that is acceptable as it is
only the deadlock state that is interesting for this flag.

This approach of "only throttle when target bdi is busy" is consistent
with the other use of PF_LESS_THROTTLE in current_may_throttle(), were
it causes attention to be focussed only on the target bdi.

So this patch
- renames PF_LESS_THROTTLE to PF_LOCAL_THROTTLE,
- removes the 25% bonus that that flag gives, and
- If PF_LOCAL_THROTTLE is set, don't delay at all unless the
global and the local free-run thresholds are exceeded.

Note that previously realtime threads were treated the same as
PF_LESS_THROTTLE threads. This patch does *not* change the behvaiour
for real-time threads, so it is now different from the behaviour of nfsd
and loop tasks. I don't know what is wanted for realtime.

[akpm@linux-foundation.org: coding style fixes]
Signed-off-by: NeilBrown
Signed-off-by: Andrew Morton
Reviewed-by: Jan Kara
Acked-by: Chuck Lever [nfsd]
Cc: Christoph Hellwig
Cc: Michal Hocko
Cc: Trond Myklebust
Link: http://lkml.kernel.org/r/87ftbf7gs3.fsf@notabene.neil.brown.name
Signed-off-by: Linus Torvalds

NeilBrown
2020-06-03 01:59:08 +0800

26 Apr, 2020

1 commit

ce5155c4f compat sysinfo(2): don't bother with field-by-field copyout ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2020-04-26 06:06:05 +0800

16 Mar, 2020

1 commit

ae56fd997 Merge 5.6-rc6 into android-mainline ... Browse Code »

Linux 5.6-rc6

Signed-off-by: Greg Kroah-Hartman
Change-Id: I6c2d7aff44ad5a9b75030b72d34ca5dbd5ad3ceb

Greg Kroah-Hartman
2020-03-16 15:09:43 +0800

04 Mar, 2020

1 commit

ecc421e05 sys/sysinfo: Respect boottime inside time namespace ... Browse Code »

The sysinfo() syscall includes uptime in seconds but has no correction for
time namespaces which makes it inconsistent with the /proc/uptime inside of
a time namespace.

Add the missing time namespace adjustment call.

Signed-off-by: Cyril Hrubis
Signed-off-by: Thomas Gleixner
Reviewed-by: Dmitry Safonov
Link: https://lkml.kernel.org/r/20200303150638.7329-1-chrubis@suse.cz

Cyril Hrubis
2020-03-04 02:34:32 +0800

03 Feb, 2020

1 commit

7881aee54 Merge 39bed42de2e7 ("Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/l… ... Browse Code »

…inux/kernel/git/rdma/rdma") into android-mainline

Baby steps in the 5.6-rc1 merge cycle to make things easier to review
and debug.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I0fa183764fd1adbde44e8181f0b3df6cff4da18b

Greg Kroah-Hartman
2020-02-03 23:34:37 +0800

28 Jan, 2020

1 commit

8d19f1c8e prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim ... Browse Code »

There are several storage drivers like dm-multipath, iscsi, tcmu-runner,
amd nbd that have userspace components that can run in the IO path. For
example, iscsi and nbd's userspace deamons may need to recreate a socket
and/or send IO on it, and dm-multipath's daemon multipathd may need to
send SG IO or read/write IO to figure out the state of paths and re-set
them up.

In the kernel these drivers have access to GFP_NOIO/GFP_NOFS and the
memalloc_*_save/restore functions to control the allocation behavior,
but for userspace we would end up hitting an allocation that ended up
writing data back to the same device we are trying to allocate for.
The device is then in a state of deadlock, because to execute IO the
device needs to allocate memory, but to allocate memory the memory
layers want execute IO to the device.

Here is an example with nbd using a local userspace daemon that performs
network IO to a remote server. We are using XFS on top of the nbd device,
but it can happen with any FS or other modules layered on top of the nbd
device that can write out data to free memory. Here a nbd daemon helper
thread, msgr-worker-1, is performing a write/sendmsg on a socket to execute
a request. This kicks off a reclaim operation which results in a WRITE to
the nbd device and the nbd thread calling back into the mm layer.

[ 1626.609191] msgr-worker-1 D 0 1026 1 0x00004000
[ 1626.609193] Call Trace:
[ 1626.609195] ? __schedule+0x29b/0x630
[ 1626.609197] ? wait_for_completion+0xe0/0x170
[ 1626.609198] schedule+0x30/0xb0
[ 1626.609200] schedule_timeout+0x1f6/0x2f0
[ 1626.609202] ? blk_finish_plug+0x21/0x2e
[ 1626.609204] ? _xfs_buf_ioapply+0x2e6/0x410
[ 1626.609206] ? wait_for_completion+0xe0/0x170
[ 1626.609208] wait_for_completion+0x108/0x170
[ 1626.609210] ? wake_up_q+0x70/0x70
[ 1626.609212] ? __xfs_buf_submit+0x12e/0x250
[ 1626.609214] ? xfs_bwrite+0x25/0x60
[ 1626.609215] xfs_buf_iowait+0x22/0xf0
[ 1626.609218] __xfs_buf_submit+0x12e/0x250
[ 1626.609220] xfs_bwrite+0x25/0x60
[ 1626.609222] xfs_reclaim_inode+0x2e8/0x310
[ 1626.609224] xfs_reclaim_inodes_ag+0x1b6/0x300
[ 1626.609227] xfs_reclaim_inodes_nr+0x31/0x40
[ 1626.609228] super_cache_scan+0x152/0x1a0
[ 1626.609231] do_shrink_slab+0x12c/0x2d0
[ 1626.609233] shrink_slab+0x9c/0x2a0
[ 1626.609235] shrink_node+0xd7/0x470
[ 1626.609237] do_try_to_free_pages+0xbf/0x380
[ 1626.609240] try_to_free_pages+0xd9/0x1f0
[ 1626.609245] __alloc_pages_slowpath+0x3a4/0xd30
[ 1626.609251] ? ___slab_alloc+0x238/0x560
[ 1626.609254] __alloc_pages_nodemask+0x30c/0x350
[ 1626.609259] skb_page_frag_refill+0x97/0xd0
[ 1626.609274] sk_page_frag_refill+0x1d/0x80
[ 1626.609279] tcp_sendmsg_locked+0x2bb/0xdd0
[ 1626.609304] tcp_sendmsg+0x27/0x40
[ 1626.609307] sock_sendmsg+0x54/0x60
[ 1626.609308] ___sys_sendmsg+0x29f/0x320
[ 1626.609313] ? sock_poll+0x66/0xb0
[ 1626.609318] ? ep_item_poll.isra.15+0x40/0xc0
[ 1626.609320] ? ep_send_events_proc+0xe6/0x230
[ 1626.609322] ? hrtimer_try_to_cancel+0x54/0xf0
[ 1626.609324] ? ep_read_events_proc+0xc0/0xc0
[ 1626.609326] ? _raw_write_unlock_irq+0xa/0x20
[ 1626.609327] ? ep_scan_ready_list.constprop.19+0x218/0x230
[ 1626.609329] ? __hrtimer_init+0xb0/0xb0
[ 1626.609331] ? _raw_spin_unlock_irq+0xa/0x20
[ 1626.609334] ? ep_poll+0x26c/0x4a0
[ 1626.609337] ? tcp_tsq_write.part.54+0xa0/0xa0
[ 1626.609339] ? release_sock+0x43/0x90
[ 1626.609341] ? _raw_spin_unlock_bh+0xa/0x20
[ 1626.609342] __sys_sendmsg+0x47/0x80
[ 1626.609347] do_syscall_64+0x5f/0x1c0
[ 1626.609349] ? prepare_exit_to_usermode+0x75/0xa0
[ 1626.609351] entry_SYSCALL_64_after_hwframe+0x44/0xa9

This patch adds a new prctl command that daemons can use after they have
done their initial setup, and before they start to do allocations that
are in the IO path. It sets the PF_MEMALLOC_NOIO and PF_LESS_THROTTLE
flags so both userspace block and FS threads can use it to avoid the
allocation recursion and try to prevent from being throttled while
writing out data to free up memory.

Signed-off-by: Mike Christie
Acked-by: Michal Hocko
Tested-by: Masato Suzuki
Reviewed-by: Damien Le Moal
Reviewed-by: Bart Van Assche
Reviewed-by: Dave Chinner
Reviewed-by: Darrick J. Wong
Link: https://lore.kernel.org/r/20191112001900.9206-1-mchristi@redhat.com
Signed-off-by: Christian Brauner

Mike Christie
2020-01-28 17:09:51 +0800

09 Dec, 2019

1 commit

d3a196a37 Merge 5.5-rc1 into android-mainline ... Browse Code »

Linux 5.5-rc1

Signed-off-by: Greg Kroah-Hartman
Change-Id: I6f952ebdd40746115165a2f99bab340482f5c237

Greg Kroah-Hartman
2019-12-09 19:12:00 +0800

05 Dec, 2019

1 commit

5e1aada08 kernel/sys.c: avoid copying possible padding bytes in copy_to_user ... Browse Code »

Initialization is not guaranteed to zero padding bytes so use an
explicit memset instead to avoid leaking any kernel content in any
possible padding bytes.

Link: http://lkml.kernel.org/r/dfa331c00881d61c8ee51577a082d8bebd61805c.camel@perches.com
Signed-off-by: Joe Perches
Cc: Dan Carpenter
Cc: Julia Lawall
Cc: Thomas Gleixner
Cc: Kees Cook
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2019-12-05 11:44:12 +0800

15 Nov, 2019

1 commit

bdd565f81 y2038: rusage: use __kernel_old_timeval ... Browse Code »

There are two 'struct timeval' fields in 'struct rusage'.

Unfortunately the definition of timeval is now ambiguous when used in
user space with a libc that has a 64-bit time_t, and this also changes
the 'rusage' definition in user space in a way that is incompatible with
the system call interface.

While there is no good solution to avoid all ambiguity here, change
the definition in the kernel headers to be compatible with the kernel
ABI, using __kernel_old_timeval as an unambiguous base type.

In previous discussions, there was also a plan to add a replacement
for rusage based on 64-bit timestamps and nanosecond resolution,
i.e. 'struct __kernel_timespec'. I have patches for that as well,
if anyone thinks we should do that.

Reviewed-by: Cyrill Gorcunov
Signed-off-by: Arnd Bergmann

Arnd Bergmann
2019-11-15 21:38:29 +0800

21 Sep, 2019

1 commit

bfa0399bc Merge Linus's 5.4-rc1-prerelease branch into android-mainline ... Browse Code »

This merges Linus's tree as of commit b41dae061bbd ("Merge tag
'xfs-5.4-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux")
into android-mainline.

This "early" merge makes it easier to test and handle merge conflicts
instead of having to wait until the "end" of the merge window and handle
all 10000+ commits at once.

Signed-off-by: Greg Kroah-Hartman
Change-Id: I6bebf55e5e2353f814e3c87f5033607b1ae5d812

Greg Kroah-Hartman
2019-09-21 07:07:54 +0800

18 Sep, 2019

1 commit

7f2444d38 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull core timer updates from Thomas Gleixner:
"Timers and timekeeping updates:

- A large overhaul of the posix CPU timer code which is a preparation
for moving the CPU timer expiry out into task work so it can be
properly accounted on the task/process.

An update to the bogus permission checks will come later during the
merge window as feedback was not complete before heading of for
travel.

- Switch the timerqueue code to use cached rbtrees and get rid of the
homebrewn caching of the leftmost node.

- Consolidate hrtimer_init() + hrtimer_init_sleeper() calls into a
single function

- Implement the separation of hrtimers to be forced to expire in hard
interrupt context even when PREEMPT_RT is enabled and mark the
affected timers accordingly.

- Implement a mechanism for hrtimers and the timer wheel to protect
RT against priority inversion and live lock issues when a (hr)timer
which should be canceled is currently executing the callback.
Instead of infinitely spinning, the task which tries to cancel the
timer blocks on a per cpu base expiry lock which is held and
released by the (hr)timer expiry code.

- Enable the Hyper-V TSC page based sched_clock for Hyper-V guests
resulting in faster access to timekeeping functions.

- Updates to various clocksource/clockevent drivers and their device
tree bindings.

- The usual small improvements all over the place"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (101 commits)
posix-cpu-timers: Fix permission check regression
posix-cpu-timers: Always clear head pointer on dequeue
hrtimer: Add a missing bracket and hide `migration_base' on !SMP
posix-cpu-timers: Make expiry_active check actually work correctly
posix-timers: Unbreak CONFIG_POSIX_TIMERS=n build
tick: Mark sched_timer to expire in hard interrupt context
hrtimer: Add kernel doc annotation for HRTIMER_MODE_HARD
x86/hyperv: Hide pv_ops access for CONFIG_PARAVIRT=n
posix-cpu-timers: Utilize timerqueue for storage
posix-cpu-timers: Move state tracking to struct posix_cputimers
posix-cpu-timers: Deduplicate rlimit handling
posix-cpu-timers: Remove pointless comparisons
posix-cpu-timers: Get rid of 64bit divisions
posix-cpu-timers: Consolidate timer expiry further
posix-cpu-timers: Get rid of zero checks
rlimit: Rewrite non-sensical RLIMIT_CPU comment
posix-cpu-timers: Respect INFINITY for hard RTTIME limit
posix-cpu-timers: Switch thread group sampling to array
posix-cpu-timers: Restructure expiry array
posix-cpu-timers: Remove cputime_expires
...

Linus Torvalds
2019-09-18 03:35:15 +0800

17 Sep, 2019

1 commit

22331f895 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 cpu-feature updates from Ingo Molnar:

- Rework the Intel model names symbols/macros, which were decades of
ad-hoc extensions and added random noise. It's now a coherent, easy
to follow nomenclature.

- Add new Intel CPU model IDs:
- "Tiger Lake" desktop and mobile models
- "Elkhart Lake" model ID
- and the "Lightning Mountain" variant of Airmont, plus support code

- Add the new AVX512_VP2INTERSECT instruction to cpufeatures

- Remove Intel MPX user-visible APIs and the self-tests, because the
toolchain (gcc) is not supporting it going forward. This is the
first, lowest-risk phase of MPX removal.

- Remove X86_FEATURE_MFENCE_RDTSC

- Various smaller cleanups and fixes

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
x86/cpu: Update init data for new Airmont CPU model
x86/cpu: Add new Airmont variant to Intel family
x86/cpu: Add Elkhart Lake to Intel family
x86/cpu: Add Tiger Lake to Intel family
x86: Correct misc typos
x86/intel: Add common OPTDIFFs
x86/intel: Aggregate microserver naming
x86/intel: Aggregate big core graphics naming
x86/intel: Aggregate big core mobile naming
x86/intel: Aggregate big core client naming
x86/cpufeature: Explain the macro duplication
x86/ftrace: Remove mcount() declaration
x86/PCI: Remove superfluous returns from void functions
x86/msr-index: Move AMD MSRs where they belong
x86/cpu: Use constant definitions for CPU models
lib: Remove redundant ftrace flag removal
x86/crash: Remove unnecessary comparison
x86/bitops: Use __builtin_constant_p() directly instead of IS_IMMEDIATE()
x86: Remove X86_FEATURE_MFENCE_RDTSC
x86/mpx: Remove MPX APIs
...

Linus Torvalds
2019-09-17 09:47:53 +0800

28 Aug, 2019

2 commits

2bbdbdae0 posix-cpu-timers: Get rid of zero checks ... Browse Code »

Deactivation of the expiry cache is done by setting all clock caches to
0. That requires to have a check for zero in all places which update the
expiry cache:

if (cache == 0 || new < cache)
cache = new;

Use U64_MAX as the deactivated value, which allows to remove the zero
checks when updating the cache and reduces it to the obvious check:

if (new < cache)
cache = new;

This also removes the weird workaround in do_prlimit() which was required
to convert a RLIMIT_CPU value of 0 (immediate expiry) to 1 because handing
in 0 to the posix CPU timer code would have effectively disarmed it.

Signed-off-by: Thomas Gleixner
Reviewed-by: Frederic Weisbecker
Link: https://lkml.kernel.org/r/20190821192922.275086128@linutronix.de

Thomas Gleixner
2019-08-28 17:50:40 +0800
24db4dd90 rlimit: Rewrite non-sensical RLIMIT_CPU comment ... Browse Code »

The comment above the function which arms RLIMIT_CPU in the posix CPU timer
code makes no sense at all. It claims that the kernel does not return an
error code when it rejected the attempt to set RLIMIT_CPU. That's clearly
bogus as the code does an error check and the rlimit is only set and
activated when the permission checks are ok. In case of a rejection an
appropriate error code is returned.

This is a historical and outdated comment which got dragged along even when
the rlimit handling code was rewritten.

Replace it with an explanation why the setup function is not called when
the rlimit value is RLIM_INFINITY and how the 'disarming' is handled.

Signed-off-by: Thomas Gleixner
Reviewed-by: Frederic Weisbecker
Link: https://lkml.kernel.org/r/20190821192922.185511287@linutronix.de

Thomas Gleixner
2019-08-28 17:50:40 +0800

21 Aug, 2019

1 commit

3e91ec89f arm64: Tighten the PR_{SET, GET}_TAGGED_ADDR_CTRL prctl() unused arguments ... Browse Code »

Require that arg{3,4,5} of the PR_{SET,GET}_TAGGED_ADDR_CTRL prctl and
arg2 of the PR_GET_TAGGED_ADDR_CTRL prctl() are zero rather than ignored
for future extensions.

Acked-by: Andrey Konovalov
Signed-off-by: Catalin Marinas
Signed-off-by: Will Deacon

Catalin Marinas
2019-08-21 01:17:55 +0800

07 Aug, 2019

1 commit

63f0c6037 arm64: Introduce prctl() options to control the tagged user addresses ABI ... Browse Code »

It is not desirable to relax the ABI to allow tagged user addresses into
the kernel indiscriminately. This patch introduces a prctl() interface
for enabling or disabling the tagged ABI with a global sysctl control
for preventing applications from enabling the relaxed ABI (meant for
testing user-space prctl() return error checking without reconfiguring
the kernel). The ABI properties are inherited by threads of the same
application and fork()'ed children but cleared on execve(). A Kconfig
option allows the overall disabling of the relaxed ABI.

The PR_SET_TAGGED_ADDR_CTRL will be expanded in the future to handle
MTE-specific settings like imprecise vs precise exceptions.

Reviewed-by: Kees Cook
Signed-off-by: Catalin Marinas
Signed-off-by: Andrey Konovalov
Signed-off-by: Will Deacon

Catalin Marinas
2019-08-07 01:08:45 +0800

22 Jul, 2019

1 commit

f240652b6 x86/mpx: Remove MPX APIs ... Browse Code »

MPX is being removed from the kernel due to a lack of support in the
toolchain going forward (gcc).

The first step is to remove the userspace-visible ABIs so that applications
will stop using it. The most visible one are the enable/disable prctl()s.
Remove them first.

This is the most minimal and least invasive change needed to ensure that
apps stop using MPX with new kernels.

Signed-off-by: Dave Hansen
Signed-off-by: Thomas Gleixner
Link: https://lkml.kernel.org/r/20190705175321.DB42F0AD@viggo.jf.intel.com

Dave Hansen
2019-07-22 17:54:57 +0800

03 Jun, 2019

1 commit

eae1a1b5a Merge Linux 5.2-rc3 into android-mainline ... Browse Code »

Linux 5.2-rc3

Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2019-06-03 17:50:00 +0800