Eric Lee / smarc-fsl-linux-kernel

13 Jan, 2021

1 commit

2179bae04 local64.h: make <asm/local64.h> mandatory ... Browse Code »

[ Upstream commit 87dbc209ea04645fd2351981f09eff5d23f8e2e9 ]

Make mandatory in include/asm-generic/Kbuild and
remove all arch/*/include/asm/local64.h arch-specific files since they
only #include .

This fixes build errors on arch/c6x/ and arch/nios2/ for
block/blk-iocost.c.

Build-tested on 21 of 25 arch-es. (tools problems on the others)

Yes, we could even rename to
and change all #includes to use
instead.

Link: https://lkml.kernel.org/r/20201227024446.17018-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap
Suggested-by: Christoph Hellwig
Reviewed-by: Masahiro Yamada
Cc: Jens Axboe
Cc: Ley Foon Tan
Cc: Mark Salter
Cc: Aurelien Jacquiot
Cc: Peter Zijlstra
Cc: Arnd Bergmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Sasha Levin

Randy Dunlap
2021-01-13 03:18:16 +0800

24 Nov, 2020

1 commit

58c644ba5 sched/idle: Fix arch_cpu_idle() vs tracing ... Browse Code »

We call arch_cpu_idle() with RCU disabled, but then use
local_irq_{en,dis}able(), which invokes tracing, which relies on RCU.

Switch all arch_cpu_idle() implementations to use
raw_local_irq_{en,dis}able() and carefully manage the
lockdep,rcu,tracing state like we do in entry.

(XXX: we really should change arch_cpu_idle() to not return with
interrupts enabled)

Reported-by: Sven Schnelle
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Mark Rutland
Tested-by: Mark Rutland
Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org

Peter Zijlstra
2020-11-24 23:47:35 +0800

24 Oct, 2020

1 commit

4a22709e2 Merge tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block ... Browse Code »

Pull arch task_work cleanups from Jens Axboe:
"Two cleanups that don't fit other categories:

- Finally get the task_work_add() cleanup done properly, so we don't
have random 0/1/false/true/TWA_SIGNAL confusing use cases. Updates
all callers, and also fixes up the documentation for
task_work_add().

- While working on some TIF related changes for 5.11, this
TIF_NOTIFY_RESUME cleanup fell out of that. Remove some arch
duplication for how that is handled"

* tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block:
task_work: cleanup notification modes
tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()

Linus Torvalds
2020-10-24 01:06:38 +0800

23 Oct, 2020

1 commit

f56e65dff Merge branch 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull initial set_fs() removal from Al Viro:
"Christoph's set_fs base series + fixups"

* 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Allow a NULL pos pointer to __kernel_read
fs: Allow a NULL pos pointer to __kernel_write
powerpc: remove address space overrides using set_fs()
powerpc: use non-set_fs based maccess routines
x86: remove address space overrides using set_fs()
x86: make TASK_SIZE_MAX usable from assembly code
x86: move PAGE_OFFSET, TASK_SIZE & friends to page_{32,64}_types.h
lkdtm: remove set_fs-based tests
test_bitmap: remove user bitmap tests
uaccess: add infrastructure for kernel builds with set_fs()
fs: don't allow splice read/write without explicit ops
fs: don't allow kernel reads and writes without iter ops
sysctl: Convert to iter interfaces
proc: add a read_iter method to proc proc_ops
proc: cleanup the compat vs no compat file ops
proc: remove a level of indentation in proc_get_inode

Linus Torvalds
2020-10-23 00:59:21 +0800

19 Oct, 2020

1 commit

ecb8ac8b1 mm/madvise: introduce process_madvise() syscall: an external memory hinting API ... Browse Code »

There is usecase that System Management Software(SMS) want to give a
memory hint like MADV_[COLD|PAGEEOUT] to other processes and in the
case of Android, it is the ActivityManagerService.

The information required to make the reclaim decision is not known to the
app. Instead, it is known to the centralized userspace
daemon(ActivityManagerService), and that daemon must be able to initiate
reclaim on its own without any app involvement.

To solve the issue, this patch introduces a new syscall
process_madvise(2). It uses pidfd of an external process to give the
hint. It also supports vector address range because Android app has
thousands of vmas due to zygote so it's totally waste of CPU and power if
we should call the syscall one by one for each vma.(With testing 2000-vma
syscall vs 1-vector syscall, it showed 15% performance improvement. I
think it would be bigger in real practice because the testing ran very
cache friendly environment).

Another potential use case for the vector range is to amortize the cost
ofTLB shootdowns for multiple ranges when using MADV_DONTNEED; this could
benefit users like TCP receive zerocopy and malloc implementations. In
future, we could find more usecases for other advises so let's make it
happens as API since we introduce a new syscall at this moment. With
that, existing madvise(2) user could replace it with process_madvise(2)
with their own pid if they want to have batch address ranges support
feature.

ince it could affect other process's address range, only privileged
process(PTRACE_MODE_ATTACH_FSCREDS) or something else(e.g., being the same
UID) gives it the right to ptrace the process could use it successfully.
The flag argument is reserved for future use if we need to extend the API.

I think supporting all hints madvise has/will supported/support to
process_madvise is rather risky. Because we are not sure all hints make
sense from external process and implementation for the hint may rely on
the caller being in the current context so it could be error-prone. Thus,
I just limited hints as MADV_[COLD|PAGEOUT] in this patch.

If someone want to add other hints, we could hear the usecase and review
it for each hint. It's safer for maintenance rather than introducing a
buggy syscall but hard to fix it later.

So finally, the API is as follows,

ssize_t process_madvise(int pidfd, const struct iovec *iovec,
unsigned long vlen, int advice, unsigned int flags);

DESCRIPTION
The process_madvise() system call is used to give advice or directions
to the kernel about the address ranges from external process as well as
local process. It provides the advice to address ranges of process
described by iovec and vlen. The goal of such advice is to improve
system or application performance.

The pidfd selects the process referred to by the PID file descriptor
specified in pidfd. (See pidofd_open(2) for further information)

The pointer iovec points to an array of iovec structures, defined in
as:

struct iovec {
void *iov_base; /* starting address */
size_t iov_len; /* number of bytes to be advised */
};

The iovec describes address ranges beginning at address(iov_base)
and with size length of bytes(iov_len).

The vlen represents the number of elements in iovec.

The advice is indicated in the advice argument, which is one of the
following at this moment if the target process specified by pidfd is
external.

MADV_COLD
MADV_PAGEOUT

Permission to provide a hint to external process is governed by a
ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2).

The process_madvise supports every advice madvise(2) has if target
process is in same thread group with calling process so user could
use process_madvise(2) to extend existing madvise(2) to support
vector address ranges.

RETURN VALUE
On success, process_madvise() returns the number of bytes advised.
This return value may be less than the total number of requested
bytes, if an error occurred. The caller should check return value
to determine whether a partial advice occurred.

FAQ:

Q.1 - Why does any external entity have better knowledge?

Quote from Sandeep

"For Android, every application (including the special SystemServer)
are forked from Zygote. The reason of course is to share as many
libraries and classes between the two as possible to benefit from the
preloading during boot.

After applications start, (almost) all of the APIs end up calling into
this SystemServer process over IPC (binder) and back to the
application.

In a fully running system, the SystemServer monitors every single
process periodically to calculate their PSS / RSS and also decides
which process is "important" to the user for interactivity.

So, because of how these processes start _and_ the fact that the
SystemServer is looping to monitor each process, it does tend to *know*
which address range of the application is not used / useful.

Besides, we can never rely on applications to clean things up
themselves. We've had the "hey app1, the system is low on memory,
please trim your memory usage down" notifications for a long time[1].
They rely on applications honoring the broadcasts and very few do.

So, if we want to avoid the inevitable killing of the application and
restarting it, some way to be able to tell the OS about unimportant
memory in these applications will be useful.

- ssp

Q.2 - How to guarantee the race(i.e., object validation) between when
giving a hint from an external process and get the hint from the target
process?

process_madvise operates on the target process's address space as it
exists at the instant that process_madvise is called. If the space
target process can run between the time the process_madvise process
inspects the target process address space and the time that
process_madvise is actually called, process_madvise may operate on
memory regions that the calling process does not expect. It's the
responsibility of the process calling process_madvise to close this
race condition. For example, the calling process can suspend the
target process with ptrace, SIGSTOP, or the freezer cgroup so that it
doesn't have an opportunity to change its own address space before
process_madvise is called. Another option is to operate on memory
regions that the caller knows a priori will be unchanged in the target
process. Yet another option is to accept the race for certain
process_madvise calls after reasoning that mistargeting will do no
harm. The suggested API itself does not provide synchronization. It
also apply other APIs like move_pages, process_vm_write.

The race isn't really a problem though. Why is it so wrong to require
that callers do their own synchronization in some manner? Nobody
objects to write(2) merely because it's possible for two processes to
open the same file and clobber each other's writes --- instead, we tell
people to use flock or something. Think about mmap. It never
guarantees newly allocated address space is still valid when the user
tries to access it because other threads could unmap the memory right
before. That's where we need synchronization by using other API or
design from userside. It shouldn't be part of API itself. If someone
needs more fine-grained synchronization rather than process level,
there were two ideas suggested - cookie[2] and anon-fd[3]. Both are
applicable via using last reserved argument of the API but I don't
think it's necessary right now since we have already ways to prevent
the race so don't want to add additional complexity with more
fine-grained optimization model.

To make the API extend, it reserved an unsigned long as last argument
so we could support it in future if someone really needs it.

Q.3 - Why doesn't ptrace work?

Injecting an madvise in the target process using ptrace would not work
for us because such injected madvise would have to be executed by the
target process, which means that process would have to be runnable and
that creates the risk of the abovementioned race and hinting a wrong
VMA. Furthermore, we want to act the hint in caller's context, not the
callee's, because the callee is usually limited in cpuset/cgroups or
even freezed state so they can't act by themselves quick enough, which
causes more thrashing/kill. It doesn't work if the target process are
ptraced(e.g., strace, debugger, minidump) because a process can have at
most one ptracer.

[1] https://developer.android.com/topic/performance/memory"

[2] process_getinfo for getting the cookie which is updated whenever
vma of process address layout are changed - Daniel Colascione -
https://lore.kernel.org/lkml/20190520035254.57579-1-minchan@kernel.org/T/#m7694416fd179b2066a2c62b5b139b14e3894e224

[3] anonymous fd which is used for the object(i.e., address range)
validation - Michal Hocko -
https://lore.kernel.org/lkml/20200120112722.GY18451@dhcp22.suse.cz/

[minchan@kernel.org: fix process_madvise build break for arm64]
Link: http://lkml.kernel.org/r/20200303145756.GA219683@google.com
[minchan@kernel.org: fix build error for mips of process_madvise]
Link: http://lkml.kernel.org/r/20200508052517.GA197378@google.com
[akpm@linux-foundation.org: fix patch ordering issue]
[akpm@linux-foundation.org: fix arm64 whoops]
[minchan@kernel.org: make process_madvise() vlen arg have type size_t, per Florian]
[akpm@linux-foundation.org: fix i386 build]
[sfr@canb.auug.org.au: fix syscall numbering]
Link: https://lkml.kernel.org/r/20200905142639.49fc3f1a@canb.auug.org.au
[sfr@canb.auug.org.au: madvise.c needs compat.h]
Link: https://lkml.kernel.org/r/20200908204547.285646b4@canb.auug.org.au
[minchan@kernel.org: fix mips build]
Link: https://lkml.kernel.org/r/20200909173655.GC2435453@google.com
[yuehaibing@huawei.com: remove duplicate header which is included twice]
Link: https://lkml.kernel.org/r/20200915121550.30584-1-yuehaibing@huawei.com
[minchan@kernel.org: do not use helper functions for process_madvise]
Link: https://lkml.kernel.org/r/20200921175539.GB387368@google.com
[akpm@linux-foundation.org: pidfd_get_pid() gained an argument]
[sfr@canb.auug.org.au: fix up for "iov_iter: transparently handle compat iovecs in import_iovec"]
Link: https://lkml.kernel.org/r/20200928212542.468e1fef@canb.auug.org.au

Signed-off-by: Minchan Kim
Signed-off-by: YueHaibing
Signed-off-by: Stephen Rothwell
Signed-off-by: Andrew Morton
Reviewed-by: Suren Baghdasaryan
Reviewed-by: Vlastimil Babka
Acked-by: David Rientjes
Cc: Alexander Duyck
Cc: Brian Geffon
Cc: Christian Brauner
Cc: Daniel Colascione
Cc: Jann Horn
Cc: Jens Axboe
Cc: Joel Fernandes
Cc: Johannes Weiner
Cc: John Dias
Cc: Kirill Tkhai
Cc: Michal Hocko
Cc: Oleksandr Natalenko
Cc: Sandeep Patil
Cc: SeongJae Park
Cc: SeongJae Park
Cc: Shakeel Butt
Cc: Sonny Rao
Cc: Tim Murray
Cc: Christian Brauner
Cc: Florian Weimer
Cc:
Link: http://lkml.kernel.org/r/20200302193630.68771-3-minchan@kernel.org
Link: http://lkml.kernel.org/r/20200508183320.GA125527@google.com
Link: http://lkml.kernel.org/r/20200622192900.22757-4-minchan@kernel.org
Link: https://lkml.kernel.org/r/20200901000633.1920247-4-minchan@kernel.org
Signed-off-by: Linus Torvalds

Minchan Kim
2020-10-19 00:27:10 +0800

18 Oct, 2020

1 commit

3c532798e tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume() ... Browse Code »

All the callers currently do this, clean it up and move the clearing
into tracehook_notify_resume() instead.

Reviewed-by: Oleg Nesterov
Reviewed-by: Thomas Gleixner
Signed-off-by: Jens Axboe

Jens Axboe
2020-10-18 05:04:36 +0800

16 Oct, 2020

1 commit

5a32c3413 Merge tag 'dma-mapping-5.10' of git://git.infradead.org/users/hch/dma-mapping ... Browse Code »

Pull dma-mapping updates from Christoph Hellwig:

- rework the non-coherent DMA allocator

- move private definitions out of

- lower CMA_ALIGNMENT (Paul Cercueil)

- remove the omap1 dma address translation in favor of the common code

- make dma-direct aware of multiple dma offset ranges (Jim Quinlan)

- support per-node DMA CMA areas (Barry Song)

- increase the default seg boundary limit (Nicolin Chen)

- misc fixes (Robin Murphy, Thomas Tai, Xu Wang)

- various cleanups

* tag 'dma-mapping-5.10' of git://git.infradead.org/users/hch/dma-mapping: (63 commits)
ARM/ixp4xx: add a missing include of dma-map-ops.h
dma-direct: simplify the DMA_ATTR_NO_KERNEL_MAPPING handling
dma-direct: factor out a dma_direct_alloc_from_pool helper
dma-direct check for highmem pages in dma_direct_alloc_pages
dma-mapping: merge into
dma-mapping: move large parts of to kernel/dma
dma-mapping: move dma-debug.h to kernel/dma/
dma-mapping: remove
dma-mapping: merge into
dma-contiguous: remove dma_contiguous_set_default
dma-contiguous: remove dev_set_cma_area
dma-contiguous: remove dma_declare_contiguous
dma-mapping: split
cma: decrease CMA_ALIGNMENT lower limit to 2
firewire-ohci: use dma_alloc_pages
dma-iommu: implement ->alloc_noncoherent
dma-mapping: add new {alloc,free}_noncoherent dma_map_ops methods
dma-mapping: add a new dma_alloc_pages API
dma-mapping: remove dma_cache_sync
53c700: convert to dma_alloc_noncoherent
...

Linus Torvalds
2020-10-16 05:43:29 +0800

13 Oct, 2020

2 commits

c90578360 Merge branch 'work.csum_and_copy' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull copy_and_csum cleanups from Al Viro:
"Saner calling conventions for csum_and_copy_..._user() and friends"

[ Removing 800+ lines of code and cleaning stuff up is good - Linus ]

* 'work.csum_and_copy' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ppc: propagate the calling conventions change down to csum_partial_copy_generic()
amd64: switch csum_partial_copy_generic() to new calling conventions
sparc64: propagate the calling convention changes down to __csum_partial_copy_...()
xtensa: propagate the calling conventions change down into csum_partial_copy_generic()
mips: propagate the calling convention change down into __csum_partial_copy_..._user()
mips: __csum_partial_copy_kernel() has no users left
mips: csum_and_copy_{to,from}_user() are never called under KERNEL_DS
sparc32: propagate the calling conventions change down to __csum_partial_copy_sparc_generic()
i386: propagate the calling conventions change down to csum_partial_copy_generic()
sh: propage the calling conventions change down to csum_partial_copy_generic()
m68k: get rid of zeroing destination on error in csum_and_copy_from_user()
arm: propagate the calling convention changes down to csum_partial_copy_from_user()
alpha: propagate the calling convention changes down to csum_partial_copy.c helpers
saner calling conventions for csum_and_copy_..._user()
csum_and_copy_..._user(): pass 0xffffffff instead of 0 as initial sum
csum_partial_copy_nocheck(): drop the last argument
unify generic instances of csum_partial_copy_nocheck()
icmp_push_reply(): reorder adding the checksum up
skb_copy_and_csum_bits(): don't bother with the last argument

Linus Torvalds
2020-10-13 07:24:13 +0800
34eb62d86 Merge tag 'core-build-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull orphan section checking from Ingo Molnar:
"Orphan link sections were a long-standing source of obscure bugs,
because the heuristics that various linkers & compilers use to handle
them (include these bits into the output image vs discarding them
silently) are both highly idiosyncratic and also version dependent.

Instead of this historically problematic mess, this tree by Kees Cook
(et al) adds build time asserts and build time warnings if there's any
orphan section in the kernel or if a section is not sized as expected.

And because we relied on so many silent assumptions in this area, fix
a metric ton of dependencies and some outright bugs related to this,
before we can finally enable the checks on the x86, ARM and ARM64
platforms"

* tag 'core-build-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
x86/boot/compressed: Warn on orphan section placement
x86/build: Warn on orphan section placement
arm/boot: Warn on orphan section placement
arm/build: Warn on orphan section placement
arm64/build: Warn on orphan section placement
x86/boot/compressed: Add missing debugging sections to output
x86/boot/compressed: Remove, discard, or assert for unwanted sections
x86/boot/compressed: Reorganize zero-size section asserts
x86/build: Add asserts for unwanted sections
x86/build: Enforce an empty .got.plt section
x86/asm: Avoid generating unused kprobe sections
arm/boot: Handle all sections explicitly
arm/build: Assert for unwanted sections
arm/build: Add missing sections
arm/build: Explicitly keep .ARM.attributes sections
arm/build: Refactor linker script headers
arm64/build: Assert for unwanted sections
arm64/build: Add missing DWARF sections
arm64/build: Use common DISCARDS in linker script
arm64/build: Remove .eh_frame* sections due to unwind tables
...

Linus Torvalds
2020-10-13 04:39:19 +0800

06 Oct, 2020

1 commit

0a0f0d8be dma-mapping: split <linux/dma-mapping.h> ... Browse Code »

Split out all the bits that are purely for dma_map_ops implementations
and related code into a new header so that they
don't get pulled into all the drivers. That also means the architecture
specific is not pulled in by
any more, which leads to a missing includes that were pulled in by the
x86 or arm versions in a few not overly portable drivers.

Signed-off-by: Christoph Hellwig

Christoph Hellwig
2020-10-06 13:07:03 +0800

25 Sep, 2020

1 commit

efa70f2fd dma-mapping: add a new dma_alloc_pages API ... Browse Code »

This API is the equivalent of alloc_pages, except that the returned memory
is guaranteed to be DMA addressable by the passed in device. The
implementation will also be used to provide a more sensible replacement
for DMA_ATTR_NON_CONSISTENT flag.

Additionally dma_alloc_noncoherent is switched over to use dma_alloc_pages
as its backend.

Signed-off-by: Christoph Hellwig
Acked-by: Thomas Bogendoerfer (MIPS part)

Christoph Hellwig
2020-09-25 12:20:47 +0800

09 Sep, 2020

1 commit

5e6e9852d uaccess: add infrastructure for kernel builds with set_fs() ... Browse Code »

Add a CONFIG_SET_FS option that is selected by architecturess that
implement set_fs, which is all of them initially. If the option is not
set stubs for routines related to overriding the address space are
provided so that architectures can start to opt out of providing set_fs.

Signed-off-by: Christoph Hellwig
Reviewed-by: Kees Cook
Signed-off-by: Al Viro

Christoph Hellwig
2020-09-09 10:21:32 +0800

04 Sep, 2020

1 commit

1e9d90dbe dma-mapping: introduce dma_get_seg_boundary_nr_pages() ... Browse Code »

We found that callers of dma_get_seg_boundary mostly do an ALIGN
with page mask and then do a page shift to get number of pages:
ALIGN(boundary + 1, 1 << shift) >> shift

However, the boundary might be as large as ULONG_MAX, which means
that a device has no specific boundary limit. So either "+ 1" or
passing it to ALIGN() would potentially overflow.

According to kernel defines:
#define ALIGN_MASK(x, mask) (((x) + (mask)) & ~(mask))
#define ALIGN(x, a) ALIGN_MASK(x, (typeof(x))(a) - 1)

We can simplify the logic here into a helper function doing:
ALIGN(boundary + 1, 1 << shift) >> shift
= ALIGN_MASK(b + 1, (1 << s) - 1) >> s
= {[b + 1 + (1 << s) - 1] & ~[(1 << s) - 1]} >> s
= [b + 1 + (1 << s) - 1] >> s
= [b + (1 << s)] >> s
= (b >> s) + 1

This patch introduces and applies dma_get_seg_boundary_nr_pages()
as an overflow-free helper for the dma_get_seg_boundary() callers
to get numbers of pages. It also takes care of the NULL dev case
for non-DMA API callers.

Suggested-by: Christoph Hellwig
Signed-off-by: Nicolin Chen
Acked-by: Niklas Schnelle
Acked-by: Michael Ellerman (powerpc)
Signed-off-by: Christoph Hellwig

Nicolin Chen
2020-09-04 00:12:15 +0800

01 Sep, 2020

1 commit

c604abc3f vmlinux.lds.h: Split ELF_DETAILS from STABS_DEBUG ... Browse Code »

The .comment section doesn't belong in STABS_DEBUG. Split it out into a
new macro named ELF_DETAILS. This will gain other non-debug sections
that need to be accounted for when linking with --orphan-handling=warn.

Signed-off-by: Kees Cook
Signed-off-by: Ingo Molnar
Cc: linux-arch@vger.kernel.org
Link: https://lore.kernel.org/r/20200821194310.3089815-5-keescook@chromium.org

Kees Cook
2020-09-01 15:50:35 +0800

24 Aug, 2020

1 commit

df561f668 treewide: Use fallthrough pseudo-keyword ... Browse Code »

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva

Gustavo A. R. Silva
2020-08-24 06:36:59 +0800

21 Aug, 2020

4 commits

b71213954 alpha: propagate the calling convention changes down to csum_partial_copy.c helpers ... Browse Code »

get rid of set_fs() in csum_partial_copy_nocheck(), while we are at it -
just take the part of csum_and_copy_from_user() sans the access_ok() check
into a helper function and have csum_partial_copy_nocheck() call that.

Signed-off-by: Al Viro

Al Viro
2020-08-21 03:45:16 +0800
c693cc467 saner calling conventions for csum_and_copy_..._user() ... Browse Code »

All callers of these primitives will
* discard anything we might've copied in case of error
* ignore the csum value in case of error
* always pass 0xffffffff as the initial sum, so the
resulting csum value (in case of success, that is) will never be 0.

That suggest the following calling conventions:
* don't pass err_ptr - just return 0 on error.
* don't bother with zeroing destination, etc. in case of error
* don't pass the initial sum - just use 0xffffffff.

This commit does the minimal conversion in the instances of csum_and_copy_...();
the changes of actual asm code behind them are done later in the series.
Note that this asm code is often shared with csum_partial_copy_nocheck();
the difference is that csum_partial_copy_nocheck() passes 0 for initial
sum while csum_and_copy_..._user() pass 0xffffffff. Fortunately, we are
free to pass 0xffffffff in all cases and subsequent patches will use that
freedom without any special comments.

A part that could be split off: parisc and uml/i386 claimed to have
csum_and_copy_to_user() instances of their own, but those were identical
to the generic one, so we simply drop them. Not sure if it's worth
a separate commit...

Signed-off-by: Al Viro

Al Viro
2020-08-21 03:45:15 +0800
cc44c17ba csum_partial_copy_nocheck(): drop the last argument ... Browse Code »

It's always 0. Note that we theoretically could use ~0U as well -
result will be the same modulo 0xffff, _if_ the damn thing did the
right thing for any value of initial sum; later we'll make use of
that when convenient.

However, unlike csum_and_copy_..._user(), there are instances that
did not work for arbitrary initial sums; c6x is one such.

Signed-off-by: Al Viro

Al Viro
2020-08-21 03:45:14 +0800
6e41c585e unify generic instances of csum_partial_copy_nocheck() ... Browse Code »

quite a few architectures have the same csum_partial_copy_nocheck() -
simply memcpy() the data and then return the csum of the copy.

hexagon, parisc, ia64, s390, um: explicitly spelled out that way.

arc, arm64, csky, h8300, m68k/nommu, microblaze, mips/GENERIC_CSUM, nds32,
nios2, openrisc, riscv, unicore32: end up picking the same thing spelled
out in lib/checksum.h (with varying amounts of perversions along the way).

everybody else (alpha, arm, c6x, m68k/mmu, mips/!GENERIC_CSUM, powerpc,
sh, sparc, x86, xtensa) have non-generic variants. For all except c6x
the declaration is in their asm/checksum.h. c6x uses the wrapper
from asm-generic/checksum.h that would normally lead to the lib/checksum.h
instance, but in case of c6x we end up using an asm function from arch/c6x
instead.

Screw that mess - have architectures with private instances define
_HAVE_ARCH_CSUM_AND_COPY in their asm/checksum.h and have the default
one right in net/checksum.h conditional on _HAVE_ARCH_CSUM_AND_COPY
*not* defined.

Signed-off-by: Al Viro

Al Viro
2020-08-21 03:45:14 +0800

15 Aug, 2020

2 commits

8f28ca6bd iomap: constify ioreadX() iomem argument (as in generic implementation) ... Browse Code »

Patch series "iomap: Constify ioreadX() iomem argument", v3.

The ioread8/16/32() and others have inconsistent interface among the
architectures: some taking address as const, some not.

It seems there is nothing really stopping all of them to take pointer to
const.

This patch (of 4):

The ioreadX() and ioreadX_rep() helpers have inconsistent interface. On
some architectures void *__iomem address argument is a pointer to const,
on some not.

Implementations of ioreadX() do not modify the memory under the address so
they can be converted to a "const" version for const-safety and
consistency among architectures.

[krzk@kernel.org: sh: clk: fix assignment from incompatible pointer type for ioreadX()]
Link: http://lkml.kernel.org/r/20200723082017.24053-1-krzk@kernel.org
[akpm@linux-foundation.org: fix drivers/mailbox/bcm-pdc-mailbox.c]
Link: http://lkml.kernel.org/r/202007132209.Rxmv4QyS%25lkp@intel.com

Suggested-by: Geert Uytterhoeven
Signed-off-by: Krzysztof Kozlowski
Signed-off-by: Andrew Morton
Reviewed-by: Geert Uytterhoeven
Reviewed-by: Arnd Bergmann
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Cc: Matt Turner
Cc: "James E.J. Bottomley"
Cc: Helge Deller
Cc: Michael Ellerman
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Yoshinori Sato
Cc: Rich Felker
Cc: Kalle Valo
Cc: "David S. Miller"
Cc: Jakub Kicinski
Cc: Dave Jiang
Cc: Jon Mason
Cc: Allen Hubbe
Cc: "Michael S. Tsirkin"
Cc: Jason Wang
Link: http://lkml.kernel.org/r/20200709072837.5869-1-krzk@kernel.org
Link: http://lkml.kernel.org/r/20200709072837.5869-2-krzk@kernel.org
Signed-off-by: Linus Torvalds

Krzysztof Kozlowski
2020-08-15 10:56:57 +0800
88db0aa24 all arch: remove system call sys_sysctl ... Browse Code »

Since commit 61a47c1ad3a4dc ("sysctl: Remove the sysctl system call"),
sys_sysctl is actually unavailable: any input can only return an error.

We have been warning about people using the sysctl system call for years
and believe there are no more users. Even if there are users of this
interface if they have not complained or fixed their code by now they
probably are not going to, so there is no point in warning them any
longer.

So completely remove sys_sysctl on all architectures.

[nixiaoming@huawei.com: s390: fix build error for sys_call_table_emu]
Link: http://lkml.kernel.org/r/20200618141426.16884-1-nixiaoming@huawei.com

Signed-off-by: Xiaoming Ni
Signed-off-by: Andrew Morton
Acked-by: Will Deacon [arm/arm64]
Acked-by: "Eric W. Biederman"
Cc: Aleksa Sarai
Cc: Alexander Shishkin
Cc: Al Viro
Cc: Andi Kleen
Cc: Andrew Morton
Cc: Andy Lutomirski
Cc: Arnaldo Carvalho de Melo
Cc: Arnd Bergmann
Cc: Benjamin Herrenschmidt
Cc: Bin Meng
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Catalin Marinas
Cc: chenzefeng
Cc: Christian Borntraeger
Cc: Christian Brauner
Cc: Chris Zankel
Cc: David Howells
Cc: David S. Miller
Cc: Diego Elio Pettenò
Cc: Dmitry Vyukov
Cc: Dominik Brodowski
Cc: Fenghua Yu
Cc: Geert Uytterhoeven
Cc: Heiko Carstens
Cc: Helge Deller
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Iurii Zaikin
Cc: Ivan Kokshaysky
Cc: James Bottomley
Cc: Jens Axboe
Cc: Jiri Olsa
Cc: Kars de Jong
Cc: Kees Cook
Cc: Krzysztof Kozlowski
Cc: Luis Chamberlain
Cc: Marco Elver
Cc: Mark Rutland
Cc: Martin K. Petersen
Cc: Masahiro Yamada
Cc: Matt Turner
Cc: Max Filippov
Cc: Michael Ellerman
Cc: Michal Simek
Cc: Miklos Szeredi
Cc: Minchan Kim
Cc: Namhyung Kim
Cc: Naveen N. Rao
Cc: Nick Piggin
Cc: Oleg Nesterov
Cc: Olof Johansson
Cc: Paul Burton
Cc: "Paul E. McKenney"
Cc: Paul Mackerras
Cc: Peter Zijlstra (Intel)
Cc: Randy Dunlap
Cc: Ravi Bangoria
Cc: Richard Henderson
Cc: Rich Felker
Cc: Russell King
Cc: Sami Tolvanen
Cc: Sargun Dhillon
Cc: Stephen Rothwell
Cc: Sudeep Holla
Cc: Sven Schnelle
Cc: Thiago Jung Bauermann
Cc: Thomas Bogendoerfer
Cc: Thomas Gleixner
Cc: Tony Luck
Cc: Vasily Gorbik
Cc: Vlastimil Babka
Cc: Yoshinori Sato
Cc: Zhou Yanjie
Link: http://lkml.kernel.org/r/20200616030734.87257-1-nixiaoming@huawei.com
Signed-off-by: Linus Torvalds

Xiaoming Ni
2020-08-15 10:56:56 +0800

13 Aug, 2020

4 commits

c0f6eda41 mm/alpha: use general page fault accounting ... Browse Code »

Use the general page fault accounting by passing regs into
handle_mm_fault().

Add the missing PERF_COUNT_SW_PAGE_FAULTS perf events too. Note, the
other two perf events (PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]) were done in
handle_mm_fault().

Signed-off-by: Peter Xu
Signed-off-by: Andrew Morton
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Cc: Matt Turner
Link: http://lkml.kernel.org/r/20200707225021.200906-3-peterx@redhat.com
Signed-off-by: Linus Torvalds

Peter Xu
2020-08-13 01:58:03 +0800
bce617ede mm: do page fault accounting in handle_mm_fault ... Browse Code »

Patch series "mm: Page fault accounting cleanups", v5.

This is v5 of the pf accounting cleanup series. It originates from Gerald
Schaefer's report on an issue a week ago regarding to incorrect page fault
accountings for retried page fault after commit 4064b9827063 ("mm: allow
VM_FAULT_RETRY for multiple times"):

https://lore.kernel.org/lkml/20200610174811.44b94525@thinkpad/

What this series did:

- Correct page fault accounting: we do accounting for a page fault
(no matter whether it's from #PF handling, or gup, or anything else)
only with the one that completed the fault. For example, page fault
retries should not be counted in page fault counters. Same to the
perf events.

- Unify definition of PERF_COUNT_SW_PAGE_FAULTS: currently this perf
event is used in an adhoc way across different archs.

Case (1): for many archs it's done at the entry of a page fault
handler, so that it will also cover e.g. errornous faults.

Case (2): for some other archs, it is only accounted when the page
fault is resolved successfully.

Case (3): there're still quite some archs that have not enabled
this perf event.

Since this series will touch merely all the archs, we unify this
perf event to always follow case (1), which is the one that makes most
sense. And since we moved the accounting into handle_mm_fault, the
other two MAJ/MIN perf events are well taken care of naturally.

- Unify definition of "major faults": the definition of "major
fault" is slightly changed when used in accounting (not
VM_FAULT_MAJOR). More information in patch 1.

- Always account the page fault onto the one that triggered the page
fault. This does not matter much for #PF handlings, but mostly for
gup. More information on this in patch 25.

Patchset layout:

Patch 1: Introduced the accounting in handle_mm_fault(), not enabled.
Patch 2-23: Enable the new accounting for arch #PF handlers one by one.
Patch 24: Enable the new accounting for the rest outliers (gup, iommu, etc.)
Patch 25: Cleanup GUP task_struct pointer since it's not needed any more

This patch (of 25):

This is a preparation patch to move page fault accountings into the
general code in handle_mm_fault(). This includes both the per task
flt_maj/flt_min counters, and the major/minor page fault perf events. To
do this, the pt_regs pointer is passed into handle_mm_fault().

PERF_COUNT_SW_PAGE_FAULTS should still be kept in per-arch page fault
handlers.

So far, all the pt_regs pointer that passed into handle_mm_fault() is
NULL, which means this patch should have no intented functional change.

Suggested-by: Linus Torvalds
Signed-off-by: Peter Xu
Signed-off-by: Andrew Morton
Cc: Albert Ou
Cc: Alexander Gordeev
Cc: Andy Lutomirski
Cc: Benjamin Herrenschmidt
Cc: Borislav Petkov
Cc: Brian Cain
Cc: Catalin Marinas
Cc: Christian Borntraeger
Cc: Chris Zankel
Cc: Dave Hansen
Cc: David S. Miller
Cc: Geert Uytterhoeven
Cc: Gerald Schaefer
Cc: Greentime Hu
Cc: Guo Ren
Cc: Heiko Carstens
Cc: Helge Deller
Cc: H. Peter Anvin
Cc: Ingo Molnar
Cc: Ivan Kokshaysky
Cc: James E.J. Bottomley
Cc: John Hubbard
Cc: Jonas Bonn
Cc: Ley Foon Tan
Cc: "Luck, Tony"
Cc: Matt Turner
Cc: Max Filippov
Cc: Michael Ellerman
Cc: Michal Simek
Cc: Nick Hu
Cc: Palmer Dabbelt
Cc: Paul Mackerras
Cc: Paul Walmsley
Cc: Pekka Enberg
Cc: Peter Zijlstra
Cc: Richard Henderson
Cc: Rich Felker
Cc: Russell King
Cc: Stafford Horne
Cc: Stefan Kristiansson
Cc: Thomas Bogendoerfer
Cc: Thomas Gleixner
Cc: Vasily Gorbik
Cc: Vincent Chen
Cc: Vineet Gupta
Cc: Will Deacon
Cc: Yoshinori Sato
Link: http://lkml.kernel.org/r/20200707225021.200906-1-peterx@redhat.com
Link: http://lkml.kernel.org/r/20200707225021.200906-2-peterx@redhat.com
Signed-off-by: Linus Torvalds

Peter Xu
2020-08-13 01:58:02 +0800
bd72866b8 alpha: fix annotation of io{read,write}{16,32}be() ... Browse Code »

These accessors must be used to read/write a big-endian bus. The value
returned or written is native-endian.

However, these accessors are defined using be{16,32}_to_cpu() or
cpu_to_be{16,32}() to make the endian conversion but these expect a
__be{16,32} when none is present. Keeping them would need a force cast
that would solve nothing at all.

So, do the conversion using swab{16,32}, like done in asm-generic for
similar situations.

Reported-by: kernel test robot
Signed-off-by: Luc Van Oostenryck
Signed-off-by: Andrew Morton
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Cc: Matt Turner
Cc: Stephen Boyd
Cc: Arnd Bergmann
Link: http://lkml.kernel.org/r/20200622114232.80039-1-luc.vanoostenryck@gmail.com
Signed-off-by: Linus Torvalds

Luc Van Oostenryck
2020-08-13 01:57:59 +0800
428e2976a uaccess: remove segment_eq ... Browse Code »

segment_eq is only used to implement uaccess_kernel. Just open code
uaccess_kernel in the arch uaccess headers and remove one layer of
indirection.

Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Acked-by: Linus Torvalds
Acked-by: Greentime Hu
Acked-by: Geert Uytterhoeven
Cc: Nick Hu
Cc: Vincent Chen
Cc: Paul Walmsley
Cc: Palmer Dabbelt
Link: http://lkml.kernel.org/r/20200710135706.537715-5-hch@lst.de
Signed-off-by: Linus Torvalds

Christoph Hellwig
2020-08-13 01:57:58 +0800

08 Aug, 2020

3 commits

f9cb654cb asm-generic: pgalloc: provide generic pgd_free() ... Browse Code »

Most architectures define pgd_free() as a wrapper for free_page().

Provide a generic version in asm-generic/pgalloc.h and enable its use for
most architectures.

Signed-off-by: Mike Rapoport
Signed-off-by: Andrew Morton
Reviewed-by: Pekka Enberg
Acked-by: Geert Uytterhoeven [m68k]
Cc: Abdul Haleem
Cc: Andy Lutomirski
Cc: Arnd Bergmann
Cc: Christophe Leroy
Cc: Joerg Roedel
Cc: Joerg Roedel
Cc: Max Filippov
Cc: Peter Zijlstra (Intel)
Cc: Satheesh Rajendran
Cc: Stafford Horne
Cc: Stephen Rothwell
Cc: Steven Rostedt
Cc: Matthew Wilcox
Link: http://lkml.kernel.org/r/20200627143453.31835-7-rppt@kernel.org
Signed-off-by: Linus Torvalds

Mike Rapoport
2020-08-08 02:33:26 +0800
1355c31ee asm-generic: pgalloc: provide generic pmd_alloc_one() and pmd_free_one() ... Browse Code »

For most architectures that support >2 levels of page tables,
pmd_alloc_one() is a wrapper for __get_free_pages(), sometimes with
__GFP_ZERO and sometimes followed by memset(0) instead.

More elaborate versions on arm64 and x86 account memory for the user page
tables and call to pgtable_pmd_page_ctor() as the part of PMD page
initialization.

Move the arm64 version to include/asm-generic/pgalloc.h and use the
generic version on several architectures.

The pgtable_pmd_page_ctor() is a NOP when ARCH_ENABLE_SPLIT_PMD_PTLOCK is
not enabled, so there is no functional change for most architectures
except of the addition of __GFP_ACCOUNT for allocation of user page
tables.

The pmd_free() is a wrapper for free_page() in all the cases, so no
functional change here.

Signed-off-by: Mike Rapoport
Signed-off-by: Andrew Morton
Reviewed-by: Pekka Enberg
Cc: Matthew Wilcox
Cc: Abdul Haleem
Cc: Andy Lutomirski
Cc: Arnd Bergmann
Cc: Christophe Leroy
Cc: Joerg Roedel
Cc: Joerg Roedel
Cc: Max Filippov
Cc: Peter Zijlstra (Intel)
Cc: Satheesh Rajendran
Cc: Stafford Horne
Cc: Stephen Rothwell
Cc: Steven Rostedt
Cc: Geert Uytterhoeven
Link: http://lkml.kernel.org/r/20200627143453.31835-5-rppt@kernel.org
Signed-off-by: Linus Torvalds

Mike Rapoport
2020-08-08 02:33:26 +0800
ca15ca406 mm: remove unneeded includes of <asm/pgalloc.h> ... Browse Code »

Patch series "mm: cleanup usage of "

Most architectures have very similar versions of pXd_alloc_one() and
pXd_free_one() for intermediate levels of page table. These patches add
generic versions of these functions in and enable
use of the generic functions where appropriate.

In addition, functions declared and defined in headers are
used mostly by core mm and early mm initialization in arch and there is no
actual reason to have the included all over the place.
The first patch in this series removes unneeded includes of

In the end it didn't work out as neatly as I hoped and moving
pXd_alloc_track() definitions to would require
unnecessary changes to arches that have custom page table allocations, so
I've decided to move lib/ioremap.c to mm/ and make pgalloc-track.h local
to mm/.

This patch (of 8):

In most cases header is required only for allocations of
page table memory. Most of the .c files that include that header do not
use symbols declared in and do not require that header.

As for the other header files that used to include , it is
possible to move that include into the .c file that actually uses symbols
from and drop the include from the header file.

The process was somewhat automated using

sed -i -E '/[
Signed-off-by: Andrew Morton
Reviewed-by: Pekka Enberg
Acked-by: Geert Uytterhoeven [m68k]
Cc: Abdul Haleem
Cc: Andy Lutomirski
Cc: Arnd Bergmann
Cc: Christophe Leroy
Cc: Joerg Roedel
Cc: Max Filippov
Cc: Peter Zijlstra
Cc: Satheesh Rajendran
Cc: Stafford Horne
Cc: Stephen Rothwell
Cc: Steven Rostedt
Cc: Joerg Roedel
Cc: Matthew Wilcox
Link: http://lkml.kernel.org/r/20200627143453.31835-1-rppt@kernel.org
Link: http://lkml.kernel.org/r/20200627143453.31835-2-rppt@kernel.org
Signed-off-by: Linus Torvalds

Mike Rapoport
2020-08-08 02:33:26 +0800

05 Aug, 2020

3 commits

2ed90dbbf Merge tag 'dma-mapping-5.9' of git://git.infradead.org/users/hch/dma-mapping ... Browse Code »

Pull dma-mapping updates from Christoph Hellwig:

- make support for dma_ops optional

- move more code out of line

- add generic support for a dma_ops bypass mode

- misc cleanups

* tag 'dma-mapping-5.9' of git://git.infradead.org/users/hch/dma-mapping:
dma-contiguous: cleanup dma_alloc_contiguous
dma-debug: use named initializers for dir2name
powerpc: use the generic dma_ops_bypass mode
dma-mapping: add a dma_ops_bypass flag to struct device
dma-mapping: make support for dma ops optional
dma-mapping: inline the fast path dma-direct calls
dma-mapping: move the remaining DMA API calls out of line

Linus Torvalds
2020-08-05 08:29:57 +0800
4f30a60aa Merge tag 'close-range-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux ... Browse Code »

Pull close_range() implementation from Christian Brauner:
"This adds the close_range() syscall. It allows to efficiently close a
range of file descriptors up to all file descriptors of a calling
task.

This is coordinated with the FreeBSD folks which have copied our
version of this syscall and in the meantime have already merged it in
April 2019:

https://reviews.freebsd.org/D21627
https://svnweb.freebsd.org/base?view=revision&revision=359836

The syscall originally came up in a discussion around the new mount
API and making new file descriptor types cloexec by default. During
this discussion, Al suggested the close_range() syscall.

First, it helps to close all file descriptors of an exec()ing task.
This can be done safely via (quoting Al's example from [1] verbatim):

/* that exec is sensitive */
unshare(CLONE_FILES);
/* we don't want anything past stderr here */
close_range(3, ~0U);
execve(....);

The code snippet above is one way of working around the problem that
file descriptors are not cloexec by default. This is aggravated by the
fact that we can't just switch them over without massively regressing
userspace. For a whole class of programs having an in-kernel method of
closing all file descriptors is very helpful (e.g. demons, service
managers, programming language standard libraries, container managers
etc.).

Second, it allows userspace to avoid implementing closing all file
descriptors by parsing through /proc//fd/* and calling close() on
each file descriptor and other hacks. From looking at various
large(ish) userspace code bases this or similar patterns are very
common in service managers, container runtimes, and programming
language runtimes/standard libraries such as Python or Rust.

In addition, the syscall will also work for tasks that do not have
procfs mounted and on kernels that do not have procfs support compiled
in. In such situations the only way to make sure that all file
descriptors are closed is to call close() on each file descriptor up
to UINT_MAX or RLIMIT_NOFILE, OPEN_MAX trickery.

Based on Linus' suggestion close_range() also comes with a new flag
CLOSE_RANGE_UNSHARE to more elegantly handle file descriptor dropping
right before exec. This would usually be expressed in the sequence:

unshare(CLONE_FILES);
close_range(3, ~0U);

as pointed out by Linus it might be desirable to have this be a part
of close_range() itself under a new flag CLOSE_RANGE_UNSHARE which
gets especially handy when we're closing all file descriptors above a
certain threshold.

Test-suite as always included"

* tag 'close-range-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
tests: add CLOSE_RANGE_UNSHARE tests
close_range: add CLOSE_RANGE_UNSHARE
tests: add close_range() tests
arch: wire-up close_range()
open: add close_range()

Linus Torvalds
2020-08-05 06:12:02 +0800
9ba27414f Merge tag 'fork-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux ... Browse Code »

Pull fork cleanups from Christian Brauner:
"This is cleanup series from when we reworked a chunk of the process
creation paths in the kernel and switched to struct
{kernel_}clone_args.

High-level this does two main things:

- Remove the double export of both do_fork() and _do_fork() where
do_fork() used the incosistent legacy clone calling convention.

Now we only export _do_fork() which is based on struct
kernel_clone_args.

- Remove the copy_thread_tls()/copy_thread() split making the
architecture specific HAVE_COYP_THREAD_TLS config option obsolete.

This switches all remaining architectures to select
HAVE_COPY_THREAD_TLS and thus to the copy_thread_tls() calling
convention. The current split makes the process creation codepaths
more convoluted than they need to be. Each architecture has their own
copy_thread() function unless it selects HAVE_COPY_THREAD_TLS then it
has a copy_thread_tls() function.

The split is not needed anymore nowadays, all architectures support
CLONE_SETTLS but quite a few of them never bothered to select
HAVE_COPY_THREAD_TLS and instead simply continued to use copy_thread()
and use the old calling convention. Removing this split cleans up the
process creation codepaths and paves the way for implementing clone3()
on such architectures since it requires the copy_thread_tls() calling
convention.

After having made each architectures support copy_thread_tls() this
series simply renames that function back to copy_thread(). It also
switches all architectures that call do_fork() directly over to
_do_fork() and the struct kernel_clone_args calling convention. This
is a corollary of switching the architectures that did not yet support
it over to copy_thread_tls() since do_fork() is conditional on not
supporting copy_thread_tls() (Mostly because it lacks a separate
argument for tls which is trivial to fix but there's no need for this
function to exist.).

The do_fork() removal is in itself already useful as it allows to to
remove the export of both do_fork() and _do_fork() we currently have
in favor of only _do_fork(). This has already been discussed back when
we added clone3(). The legacy clone() calling convention is - as is
probably well-known - somewhat odd:

#
# ABI hall of shame
#
config CLONE_BACKWARDS
config CLONE_BACKWARDS2
config CLONE_BACKWARDS3

that is aggravated by the fact that some architectures such as sparc
follow the CLONE_BACKWARDSx calling convention but don't really select
the corresponding config option since they call do_fork() directly.

So do_fork() enforces a somewhat arbitrary calling convention in the
first place that doesn't really help the individual architectures that
deviate from it. They can thus simply be switched to _do_fork()
enforcing a single calling convention. (I really hope that any new
architectures will __not__ try to implement their own calling
conventions...)

Most architectures already have made a similar switch (m68k comes to
mind).

Overall this removes more code than it adds even with a good portion
of added comments. It simplifies a chunk of arch specific assembly
either by moving the code into C or by simply rewriting the assembly.

Architectures that have been touched in non-trivial ways have all been
actually boot and stress tested: sparc and ia64 have been tested with
Debian 9 images. They are the two architectures which have been
touched the most. All non-trivial changes to architectures have seen
acks from the relevant maintainers. nios2 with a custom built
buildroot image. h8300 I couldn't get something bootable to test on
but the changes have been fairly automatic and I'm sure we'll hear
people yell if I broke something there.

All other architectures that have been touched in trivial ways have
been compile tested for each single patch of the series via git rebase
-x "make ..." v5.8-rc2. arm{64} and x86{_64} have been boot tested
even though they have just been trivially touched (removal of the
HAVE_COPY_THREAD_TLS macro from their Kconfig) because well they are
basically "core architectures" and since it is trivial to get your
hands on a useable image"

* tag 'fork-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
arch: rename copy_thread_tls() back to copy_thread()
arch: remove HAVE_COPY_THREAD_TLS
unicore: switch to copy_thread_tls()
sh: switch to copy_thread_tls()
nds32: switch to copy_thread_tls()
microblaze: switch to copy_thread_tls()
hexagon: switch to copy_thread_tls()
c6x: switch to copy_thread_tls()
alpha: switch to copy_thread_tls()
fork: remove do_fork()
h8300: select HAVE_COPY_THREAD_TLS, switch to kernel_clone_args
nios2: enable HAVE_COPY_THREAD_TLS, switch to kernel_clone_args
ia64: enable HAVE_COPY_THREAD_TLS, switch to kernel_clone_args
sparc: unconditionally enable HAVE_COPY_THREAD_TLS
sparc: share process creation helpers between sparc and sparc64
sparc64: enable HAVE_COPY_THREAD_TLS
fork: fold legacy_clone_args_valid() into _do_fork()

Linus Torvalds
2020-08-05 05:47:45 +0800

04 Aug, 2020

1 commit

9ba19ccd2 Merge tag 'locking-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull locking updates from Ingo Molnar:

- LKMM updates: mostly documentation changes, but also some new litmus
tests for atomic ops.

- KCSAN updates: the most important change is that GCC 11 now has all
fixes in place to support KCSAN, so GCC support can be enabled again.
Also more annotations.

- futex updates: minor cleanups and simplifications

- seqlock updates: merge preparatory changes/cleanups for the
'associated locks' facilities.

- lockdep updates:
- simplify IRQ trace event handling
- add various new debug checks
- simplify header dependencies, split out ,
decouple lockdep from other low level headers some more
- fix NMI handling

- misc cleanups and smaller fixes

* tag 'locking-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
kcsan: Improve IRQ state trace reporting
lockdep: Refactor IRQ trace events fields into struct
seqlock: lockdep assert non-preemptibility on seqcount_t write
lockdep: Add preemption enabled/disabled assertion APIs
seqlock: Implement raw_seqcount_begin() in terms of raw_read_seqcount()
seqlock: Add kernel-doc for seqcount_t and seqlock_t APIs
seqlock: Reorder seqcount_t and seqlock_t API definitions
seqlock: seqcount_t latch: End read sections with read_seqcount_retry()
seqlock: Properly format kernel-doc code samples
Documentation: locking: Describe seqlock design and usage
locking/qspinlock: Do not include atomic.h from qspinlock_types.h
locking/atomic: Move ATOMIC_INIT into linux/types.h
lockdep: Move list.h inclusion into lockdep.h
locking/lockdep: Fix TRACE_IRQFLAGS vs. NMIs
futex: Remove unused or redundant includes
futex: Consistently use fshared as boolean
futex: Remove needless goto's
futex: Remove put_futex_key()
rwsem: fix commas in initialisation
docs: locking: Replace HTTP links with HTTPS ones
...

Linus Torvalds
2020-08-04 05:39:35 +0800

29 Jul, 2020

1 commit

7ca8cf534 locking/atomic: Move ATOMIC_INIT into linux/types.h ... Browse Code »

This patch moves ATOMIC_INIT from asm/atomic.h into linux/types.h.
This allows users of atomic_t to use ATOMIC_INIT without having to
include atomic.h as that way may lead to header loops.

Signed-off-by: Herbert Xu
Signed-off-by: Peter Zijlstra (Intel)
Acked-by: Waiman Long
Link: https://lkml.kernel.org/r/20200729123105.GB7047@gondor.apana.org.au

Herbert Xu
2020-07-29 22:14:18 +0800

21 Jul, 2020

2 commits

bb7cdd381 alpha: Replace smp_read_barrier_depends() usage with smp_[r]mb() ... Browse Code »

In preparation for removing smp_read_barrier_depends() altogether,
move the Alpha code over to using smp_rmb() and smp_mb() directly.

Acked-by: Peter Zijlstra (Intel)
Acked-by: Paul E. McKenney
Signed-off-by: Will Deacon

Will Deacon
2020-07-21 17:50:36 +0800
d64628588 alpha: Override READ_ONCE() with barriered implementation ... Browse Code »

Rather then relying on the core code to use smp_read_barrier_depends()
as part of the READ_ONCE() definition, instead override __READ_ONCE()
in the Alpha code so that it generates the required mb() and then
implement smp_load_acquire() using the new macro to avoid redundant
back-to-back barriers from the generic implementation.

Acked-by: Peter Zijlstra (Intel)
Acked-by: Paul E. McKenney
Signed-off-by: Will Deacon

Will Deacon
2020-07-21 17:50:35 +0800

19 Jul, 2020

1 commit

2f9237d4f dma-mapping: make support for dma ops optional ... Browse Code »

Avoid the overhead of the dma ops support for tiny builds that only
use the direct mapping.

Signed-off-by: Christoph Hellwig
Tested-by: Alexey Kardashevskiy
Reviewed-by: Alexey Kardashevskiy

Christoph Hellwig
2020-07-19 15:29:23 +0800

05 Jul, 2020

3 commits

714acdbd1 arch: rename copy_thread_tls() back to copy_thread() ... Browse Code »

Now that HAVE_COPY_THREAD_TLS has been removed, rename copy_thread_tls()
back simply copy_thread(). It's a simpler name, and doesn't imply that only
tls is copied here. This finishes an outstanding chunk of internal process
creation work since we've added clone3().

Cc: linux-arch@vger.kernel.org
Acked-by: Thomas Bogendoerfer A
Acked-by: Stafford Horne
Acked-by: Greentime Hu
Acked-by: Geert Uytterhoeven A
Reviewed-by: Kees Cook
Signed-off-by: Christian Brauner

Christian Brauner
2020-07-05 05:41:37 +0800
140c8180e arch: remove HAVE_COPY_THREAD_TLS ... Browse Code »

All architectures support copy_thread_tls() now, so remove the legacy
copy_thread() function and the HAVE_COPY_THREAD_TLS config option. Everyone
uses the same process creation calling convention based on
copy_thread_tls() and struct kernel_clone_args. This will make it easier to
maintain the core process creation code under kernel/, simplifies the
callpaths and makes the identical for all architectures.

Cc: linux-arch@vger.kernel.org
Acked-by: Thomas Bogendoerfer
Acked-by: Greentime Hu
Acked-by: Geert Uytterhoeven
Reviewed-by: Kees Cook
Signed-off-by: Christian Brauner

Christian Brauner
2020-07-05 05:41:37 +0800
0fdfc53f2 alpha: switch to copy_thread_tls() ... Browse Code »

Use the copy_thread_tls() calling convention which passes tls through a
register. This is required so we can remove the copy_thread{_tls}() split
and remove the HAVE_COPY_THREAD_TLS macro.

Cc: linux-alpha@vger.kernel.org
Signed-off-by: Christian Brauner

Christian Brauner
2020-07-05 05:41:36 +0800

17 Jun, 2020

1 commit

9b4feb630 arch: wire-up close_range() ... Browse Code »

This wires up the close_range() syscall into all arches at once.

Suggested-by: Arnd Bergmann
Signed-off-by: Christian Brauner
Reviewed-by: Oleg Nesterov
Acked-by: Arnd Bergmann
Acked-by: Michael Ellerman (powerpc)
Cc: Jann Horn
Cc: David Howells
Cc: Dmitry V. Levin
Cc: Linus Torvalds
Cc: Al Viro
Cc: Florian Weimer
Cc: linux-api@vger.kernel.org
Cc: linux-alpha@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-mips@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: linux-arch@vger.kernel.org
Cc: x86@kernel.org

Christian Brauner
2020-06-17 06:07:38 +0800