Eric Lee / smarc-fsl-linux-kernel

06 Jun, 2020

1 commit

ac7b34218 Merge tag 'core_core_updates_for_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull READ_IMPLIES_EXEC changes from Borislav Petkov:
"Split the old READ_IMPLIES_EXEC workaround from executable
PT_GNU_STACK now that toolchains long support PT_GNU_STACK marking and
there's no need anymore to force modern programs into having all its
user mappings executable instead of only the stack and the PROT_EXEC
ones.

Disable that automatic READ_IMPLIES_EXEC forcing on x86-64 and
arm64.

Add tables documenting how READ_IMPLIES_EXEC is handled on x86-64, arm
and arm64.

By Kees Cook"

* tag 'core_core_updates_for_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
arm64/elf: Disable automatic READ_IMPLIES_EXEC for 64-bit address spaces
arm32/64/elf: Split READ_IMPLIES_EXEC from executable PT_GNU_STACK
arm32/64/elf: Add tables to document READ_IMPLIES_EXEC
x86/elf: Disable automatic READ_IMPLIES_EXEC on 64-bit
x86/elf: Split READ_IMPLIES_EXEC from executable PT_GNU_STACK
x86/elf: Add table to document READ_IMPLIES_EXEC

Linus Torvalds
2020-06-06 04:45:21 +0800

05 Jun, 2020

13 commits

886d7de63 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge yet more updates from Andrew Morton:

- More MM work. 100ish more to go. Mike Rapoport's "mm: remove
__ARCH_HAS_5LEVEL_HACK" series should fix the current ppc issue

- Various other little subsystems

* emailed patches from Andrew Morton : (127 commits)
lib/ubsan.c: fix gcc-10 warnings
tools/testing/selftests/vm: remove duplicate headers
selftests: vm: pkeys: fix multilib builds for x86
selftests: vm: pkeys: use the correct page size on powerpc
selftests/vm/pkeys: override access right definitions on powerpc
selftests/vm/pkeys: test correct behaviour of pkey-0
selftests/vm/pkeys: introduce a sub-page allocator
selftests/vm/pkeys: detect write violation on a mapped access-denied-key page
selftests/vm/pkeys: associate key on a mapped page and detect write violation
selftests/vm/pkeys: associate key on a mapped page and detect access violation
selftests/vm/pkeys: improve checks to determine pkey support
selftests/vm/pkeys: fix assertion in test_pkey_alloc_exhaust()
selftests/vm/pkeys: fix number of reserved powerpc pkeys
selftests/vm/pkeys: introduce powerpc support
selftests/vm/pkeys: introduce generic pkey abstractions
selftests: vm: pkeys: use the correct huge page size
selftests/vm/pkeys: fix alloc_random_pkey() to make it really random
selftests/vm/pkeys: fix assertion in pkey_disable_set/clear()
selftests/vm/pkeys: fix pkey_disable_clear()
selftests: vm: pkeys: add helpers for pkey bits
...

Linus Torvalds
2020-06-05 10:18:29 +0800
762a3af6f exec: open code copy_string_kernel ... Browse Code »

Currently copy_string_kernel is just a wrapper around copy_strings that
simplifies the calling conventions and uses set_fs to allow passing a
kernel pointer. But due to the fact the we only need to handle a single
kernel argument pointer, the logic can be sigificantly simplified while
getting rid of the set_fs.

Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Cc: Alexander Viro
Link: http://lkml.kernel.org/r/20200501104105.2621149-3-hch@lst.de
Signed-off-by: Linus Torvalds

Christoph Hellwig
2020-06-05 10:06:26 +0800
986db2d14 exec: simplify the copy_strings_kernel calling convention ... Browse Code »

copy_strings_kernel is always used with a single argument,
adjust the calling convention to that.

Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Cc: Alexander Viro
Link: http://lkml.kernel.org/r/20200501104105.2621149-2-hch@lst.de
Signed-off-by: Linus Torvalds

Christoph Hellwig
2020-06-05 10:06:26 +0800
a39630157 fs/seq_file.c: seq_read: Update pr_info_ratelimited ... Browse Code »

Use a more common logging style.

Add and use pr_fmt, coalesce the format string, align arguments,
use better grammar.

Signed-off-by: Joe Perches
Signed-off-by: Andrew Morton
Cc: Vasily Averin
Link: http://lkml.kernel.org/r/96ff603230ca1bd60034c36519be3930c3a3a226.camel@perches.com
Signed-off-by: Linus Torvalds

Joe Perches
2020-06-05 10:06:25 +0800
898310032 fat: improve the readahead for FAT entries ... Browse Code »

Current readahead for FAT entries is very simple but is having some flaws,
so it is not working well for some environments. This patch improves the
readahead more or less.

The key points of modification are,

- make the readahead size tunable by using bdi->ra_pages
- care the bdi->io_pages to avoid the small size I/O request
- update readahead window before fully exhausting

With this patch, on slow USB connected 2TB hdd:

[before]
383.18sec

[after]
51.03sec

Signed-off-by: OGAWA Hirofumi
Signed-off-by: Andrew Morton
Tested-by: hyeongseok.kim
Reviewed-by: hyeongseok.kim
Link: http://lkml.kernel.org/r/87d08e1dlh.fsf@mail.parknet.co.jp
Signed-off-by: Linus Torvalds

OGAWA Hirofumi
2020-06-05 10:06:25 +0800
b1b65750b fat: don't allow to mount if the FAT length == 0 ... Browse Code »

If FAT length == 0, the image doesn't have any data. And it can be the
cause of overlapping the root dir and FAT entries.

Also Windows treats it as invalid format.

Reported-by: syzbot+6f1624f937d9d6911e2d@syzkaller.appspotmail.com
Signed-off-by: OGAWA Hirofumi
Signed-off-by: Andrew Morton
Cc: Marco Elver
Cc: Dmitry Vyukov
Link: http://lkml.kernel.org/r/87r1wz8mrd.fsf@mail.parknet.co.jp
Signed-off-by: Linus Torvalds

OGAWA Hirofumi
2020-06-05 10:06:25 +0800
852991dd3 fs/binfmt_elf: remove redundant elf_map ifndef ... Browse Code »

The ifndef was added a long time ago to support archs that would define
their own mapping function. The last user was the metag arch which was
removed from the tree, and as such there are no users left. Let's kill
it.

Signed-off-by: Anthony Iliopoulos
Signed-off-by: Andrew Morton
Link: http://lkml.kernel.org/r/20200402161543.4119-1-ailiop@suse.com
Signed-off-by: Linus Torvalds

Anthony Iliopoulos
2020-06-05 10:06:25 +0800
8977a27b6 proc: rename "catch" function argument ... Browse Code »

"catch" is reserved keyword in C++, rename it to something both gcc and
g++ accept.

Rename "ign" for symmetry.

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Link: http://lkml.kernel.org/r/20200331210905.GA31680@avx2
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2020-06-05 10:06:24 +0800
15a2bc4db Merge branch 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace ... Browse Code »

Pull execve updates from Eric Biederman:
"Last cycle for the Nth time I ran into bugs and quality of
implementation issues related to exec that could not be easily be
fixed because of the way exec is implemented. So I have been digging
into exec and cleanup up what I can.

I don't think I have exec sorted out enough to fix the issues I
started with but I have made some headway this cycle with 4 sets of
changes.

- promised cleanups after introducing exec_update_mutex

- trivial cleanups for exec

- control flow simplifications

- remove the recomputation of bprm->cred

The net result is code that is a bit easier to understand and work
with and a decrease in the number of lines of code (if you don't count
the added tests)"

* 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (24 commits)
exec: Compute file based creds only once
exec: Add a per bprm->file version of per_clear
binfmt_elf_fdpic: fix execfd build regression
selftests/exec: Add binfmt_script regression test
exec: Remove recursion from search_binary_handler
exec: Generic execfd support
exec/binfmt_script: Don't modify bprm->buf and then return -ENOEXEC
exec: Move the call of prepare_binprm into search_binary_handler
exec: Allow load_misc_binary to call prepare_binprm unconditionally
exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds
exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds
exec: Teach prepare_exec_creds how exec treats uids & gids
exec: Set the point of no return sooner
exec: Move handling of the point of no return to the top level
exec: Run sync_mm_rss before taking exec_update_mutex
exec: Fix spelling of search_binary_handler in a comment
exec: Move the comment from above de_thread to above unshare_sighand
exec: Rename flush_old_exec begin_new_exec
exec: Move most of setup_new_exec into flush_old_exec
exec: In setup_new_exec cache current in the local variable me
...

Linus Torvalds
2020-06-05 05:07:08 +0800
9ff725857 Merge branch 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace ... Browse Code »

Pull proc updates from Eric Biederman:
"This has four sets of changes:

- modernize proc to support multiple private instances

- ensure we see the exit of each process tid exactly

- remove has_group_leader_pid

- use pids not tasks in posix-cpu-timers lookup

Alexey updated proc so each mount of proc uses a new superblock. This
allows people to actually use mount options with proc with no fear of
messing up another mount of proc. Given the kernel's internal mounts
of proc for things like uml this was a real problem, and resulted in
Android's hidepid mount options being ignored and introducing security
issues.

The rest of the changes are small cleanups and fixes that came out of
my work to allow this change to proc. In essence it is swapping the
pids in de_thread during exec which removes a special case the code
had to handle. Then updating the code to stop handling that special
case"

* 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
proc: proc_pid_ns takes super_block as an argument
remove the no longer needed pid_alive() check in __task_pid_nr_ns()
posix-cpu-timers: Replace __get_task_for_clock with pid_for_clock
posix-cpu-timers: Replace cpu_timer_pid_type with clock_pid_type
posix-cpu-timers: Extend rcu_read_lock removing task_struct references
signal: Remove has_group_leader_pid
exec: Remove BUG_ON(has_group_leader_pid)
posix-cpu-timer: Unify the now redundant code in lookup_task
posix-cpu-timer: Tidy up group_leader logic in lookup_task
proc: Ensure we see the exit of each process tid exactly once
rculist: Add hlists_swap_heads_rcu
proc: Use PIDTYPE_TGID in next_tgid
Use proc_pid_ns() to get pid_namespace from the proc superblock
proc: use named enums for better readability
proc: use human-readable values for hidepid
docs: proc: add documentation for "hidepid=4" and "subset=pid" options and new mount behavior
proc: add option to mount only a pids subset
proc: instantiate only pids that we can ptrace on 'hidepid=4' mount option
proc: allow to mount many instances of proc in one pid namespace
proc: rename struct proc_fs_info to proc_fs_opts

Linus Torvalds
2020-06-05 04:54:34 +0800
051c3556e Merge tag 'for_v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs ... Browse Code »

Pull ext2 and reiserfs cleanups from Jan Kara:
"Two small cleanups for ext2 and one for reiserfs"

* tag 'for_v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
reiserfs: Replace kmalloc with kcalloc in the comment
ext2: code cleanup by removing ifdef macro surrounding
ext2: Fix i_op setting for special inode

Linus Torvalds
2020-06-05 04:53:10 +0800
07c8f3bfe Merge tag 'fsnotify_for_v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs ... Browse Code »

Pull fsnotify updates from Jan Kara:
"Several smaller fixes and cleanups for fsnotify subsystem"

* tag 'fsnotify_for_v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify: fix ignore mask logic for events on child and on dir
fanotify: don't write with size under sizeof(response)
fsnotify: Remove proc_fs.h include
fanotify: remove reference to fill_event_metadata()
fsnotify: add mutex destroy
fanotify: prefix should_merge()
fanotify: Replace zero-length array with flexible-array
inotify: Fix error return code assignment flow.
fsnotify: Add missing annotation for fsnotify_finish_user_wait() and for fsnotify_prepare_user_wait()

Linus Torvalds
2020-06-05 04:51:54 +0800
d77d1dbba Merge tag 'zonefs-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs ... Browse Code »

Pull zonefs update from Damien Le Moal:
"Only one patch in this pull request to cleanup handling of uuid using
the import_uuid() helper, from Andy"

* tag 'zonefs-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonefs: Replace uuid_copy() with import_uuid()

Linus Torvalds
2020-06-05 04:50:13 +0800

04 Jun, 2020

6 commits

ee01c4d72 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge more updates from Andrew Morton:
"More mm/ work, plenty more to come

Subsystems affected by this patch series: slub, memcg, gup, kasan,
pagealloc, hugetlb, vmscan, tools, mempolicy, memblock, hugetlbfs,
thp, mmap, kconfig"

* akpm: (131 commits)
arm64: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
riscv: support DEBUG_WX
mm: add DEBUG_WX support
drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup
mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid()
powerpc/mm: drop platform defined pmd_mknotpresent()
mm: thp: don't need to drain lru cache when splitting and mlocking THP
hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs
sparc32: register memory occupied by kernel as memblock.memory
include/linux/memblock.h: fix minor typo and unclear comment
mm, mempolicy: fix up gup usage in lookup_node
tools/vm/page_owner_sort.c: filter out unneeded line
mm: swap: memcg: fix memcg stats for huge pages
mm: swap: fix vmstats for huge pages
mm: vmscan: limit the range of LRU type balancing
mm: vmscan: reclaim writepage is IO cost
mm: vmscan: determine anon/file pressure balance at the reclaim root
mm: balance LRU lists based on relative thrashing
mm: only count actual rotations as LRU reclaim cost
...

Linus Torvalds
2020-06-04 11:24:15 +0800
885902531 hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs ... Browse Code »

In a 32-bit program, running on arm64 architecture. When the address
space below mmap base is completely exhausted, shmat() for huge pages will
return ENOMEM, but shmat() for normal pages can still success on no-legacy
mode. This seems not fair.

For normal pages, the calling trace of get_unmapped_area() is:

=> mm->get_unmapped_area()
if on legacy mode,
=> arch_get_unmapped_area()
=> vm_unmapped_area()
if on no-legacy mode,
=> arch_get_unmapped_area_topdown()
=> vm_unmapped_area()

For huge pages, the calling trace of get_unmapped_area() is:

=> file->f_op->get_unmapped_area()
=> hugetlb_get_unmapped_area()
=> vm_unmapped_area()

To solve this issue, we only need to make hugetlb_get_unmapped_area() take
the same way as mm->get_unmapped_area(). Add *bottomup() and *topdown()
for hugetlbfs, and check current mm->get_unmapped_area() to decide which
one to use. If mm->get_unmapped_area is equal to
arch_get_unmapped_area_topdown(), hugetlb_get_unmapped_area() calls
topdown routine, otherwise calls bottomup routine.

Reported-by: kbuild test robot
Signed-off-by: Shijie Hu
Signed-off-by: Mike Kravetz
Signed-off-by: Andrew Morton
Cc: Will Deacon
Cc: Xiaoming Ni
Cc: Kefeng Wang
Cc: yangerkun
Cc: ChenGang
Cc: Chen Jie
Link: http://lkml.kernel.org/r/20200518065338.113664-1-hushijie3@huawei.com
Signed-off-by: Linus Torvalds

Shijie Hu
2020-06-04 11:09:49 +0800
6058eaec8 mm: fold and remove lru_cache_add_anon() and lru_cache_add_file() ... Browse Code »

They're the same function, and for the purpose of all callers they are
equivalent to lru_cache_add().

[akpm@linux-foundation.org: fix it for local_lock changes]
Signed-off-by: Johannes Weiner
Signed-off-by: Andrew Morton
Reviewed-by: Rik van Riel
Acked-by: Michal Hocko
Acked-by: Minchan Kim
Cc: Joonsoo Kim
Link: http://lkml.kernel.org/r/20200520232525.798933-5-hannes@cmpxchg.org
Signed-off-by: Linus Torvalds

Johannes Weiner
2020-06-04 11:09:48 +0800
cb8e59cc8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next ... Browse Code »

Pull networking updates from David Miller:

1) Allow setting bluetooth L2CAP modes via socket option, from Luiz
Augusto von Dentz.

2) Add GSO partial support to igc, from Sasha Neftin.

3) Several cleanups and improvements to r8169 from Heiner Kallweit.

4) Add IF_OPER_TESTING link state and use it when ethtool triggers a
device self-test. From Andrew Lunn.

5) Start moving away from custom driver versions, use the globally
defined kernel version instead, from Leon Romanovsky.

6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin.

7) Allow hard IRQ deferral during NAPI, from Eric Dumazet.

8) Add sriov and vf support to hinic, from Luo bin.

9) Support Media Redundancy Protocol (MRP) in the bridging code, from
Horatiu Vultur.

10) Support netmap in the nft_nat code, from Pablo Neira Ayuso.

11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina
Dubroca. Also add ipv6 support for espintcp.

12) Lots of ReST conversions of the networking documentation, from Mauro
Carvalho Chehab.

13) Support configuration of ethtool rxnfc flows in bcmgenet driver,
from Doug Berger.

14) Allow to dump cgroup id and filter by it in inet_diag code, from
Dmitry Yakunin.

15) Add infrastructure to export netlink attribute policies to
userspace, from Johannes Berg.

16) Several optimizations to sch_fq scheduler, from Eric Dumazet.

17) Fallback to the default qdisc if qdisc init fails because otherwise
a packet scheduler init failure will make a device inoperative. From
Jesper Dangaard Brouer.

18) Several RISCV bpf jit optimizations, from Luke Nelson.

19) Correct the return type of the ->ndo_start_xmit() method in several
drivers, it's netdev_tx_t but many drivers were using
'int'. From Yunjian Wang.

20) Add an ethtool interface for PHY master/slave config, from Oleksij
Rempel.

21) Add BPF iterators, from Yonghang Song.

22) Add cable test infrastructure, including ethool interfaces, from
Andrew Lunn. Marvell PHY driver is the first to support this
facility.

23) Remove zero-length arrays all over, from Gustavo A. R. Silva.

24) Calculate and maintain an explicit frame size in XDP, from Jesper
Dangaard Brouer.

25) Add CAP_BPF, from Alexei Starovoitov.

26) Support terse dumps in the packet scheduler, from Vlad Buslov.

27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei.

28) Add devm_register_netdev(), from Bartosz Golaszewski.

29) Minimize qdisc resets, from Cong Wang.

30) Get rid of kernel_getsockopt and kernel_setsockopt in order to
eliminate set_fs/get_fs calls. From Christoph Hellwig.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits)
selftests: net: ip_defrag: ignore EPERM
net_failover: fixed rollback in net_failover_open()
Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv"
Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv"
vmxnet3: allow rx flow hash ops only when rss is enabled
hinic: add set_channels ethtool_ops support
selftests/bpf: Add a default $(CXX) value
tools/bpf: Don't use $(COMPILE.c)
bpf, selftests: Use bpf_probe_read_kernel
s390/bpf: Use bcr 0,%0 as tail call nop filler
s390/bpf: Maintain 8-byte stack alignment
selftests/bpf: Fix verifier test
selftests/bpf: Fix sample_cnt shared between two threads
bpf, selftests: Adapt cls_redirect to call csum_level helper
bpf: Add csum_level helper for fixing up csum levels
bpf: Fix up bpf_skb_adjust_room helper's skb csum setting
sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf()
crypto/chtls: IPv6 support for inline TLS
Crypto/chcr: Fixes a coccinile check error
Crypto/chcr: Fixes compilations warnings
...

Linus Torvalds
2020-06-04 07:27:18 +0800
ae03c53d0 Merge branch 'work.splice' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull splice updates from Al Viro:
"Christoph's assorted splice cleanups"

* 'work.splice' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: rename pipe_buf ->steal to ->try_steal
fs: make the pipe_buf_operations ->confirm operation optional
fs: make the pipe_buf_operations ->steal operation optional
trace: remove tracing_pipe_buf_ops
pipe: merge anon_pipe_buf*_ops
fs: simplify do_splice_from
fs: simplify do_splice_to

Linus Torvalds
2020-06-04 06:52:19 +0800
e7c93cbfe Merge tag 'threads-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux ... Browse Code »

Pull thread updates from Christian Brauner:
"We have been discussing using pidfds to attach to namespaces for quite
a while and the patches have in one form or another already existed
for about a year. But I wanted to wait to see how the general api
would be received and adopted.

This contains the changes to make it possible to use pidfds to attach
to the namespaces of a process, i.e. they can be passed as the first
argument to the setns() syscall.

When only a single namespace type is specified the semantics are
equivalent to passing an nsfd. That means setns(nsfd, CLONE_NEWNET)
equals setns(pidfd, CLONE_NEWNET).

However, when a pidfd is passed, multiple namespace flags can be
specified in the second setns() argument and setns() will attach the
caller to all the specified namespaces all at once or to none of them.

Specifying 0 is not valid together with a pidfd. Here are just two
obvious examples:

setns(pidfd, CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWNET);
setns(pidfd, CLONE_NEWUSER);

Allowing to also attach subsets of namespaces supports various
use-cases where callers setns to a subset of namespaces to retain
privilege, perform an action and then re-attach another subset of
namespaces.

Apart from significantly reducing the number of syscalls needed to
attach to all currently supported namespaces (eight "open+setns"
sequences vs just a single "setns()"), this also allows atomic setns
to a set of namespaces, i.e. either attaching to all namespaces
succeeds or we fail without having changed anything.

This is centered around a new internal struct nsset which holds all
information necessary for a task to switch to a new set of namespaces
atomically. Fwiw, with this change a pidfd becomes the only token
needed to interact with a container. I'm expecting this to be
picked-up by util-linux for nsenter rather soon.

Associated with this change is a shiny new test-suite dedicated to
setns() (for pidfds and nsfds alike)"

* tag 'threads-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
selftests/pidfd: add pidfd setns tests
nsproxy: attach to namespaces via pidfds
nsproxy: add struct nsset

Linus Torvalds
2020-06-04 04:12:57 +0800

03 Jun, 2020

20 commits

d6f9469a0 Merge tag 'erofs-for-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs ... Browse Code »

Pull erofs updates from Gao Xiang:
"The most interesting part is the new mount api conversion, which is
actually a old patch already pending for several cycles. And the
others are recent trivial cleanups here.

Summary:

- Convert to use the new mount apis

- Some random cleanup patches"

* tag 'erofs-for-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: suppress false positive last_block warning
erofs: convert to use the new mount fs_context api
erofs: code cleanup by removing ifdef macro surrounding

Linus Torvalds
2020-06-03 11:16:55 +0800
cadf32234 Merge tag 'jfs-5.8' of git://github.com/kleikamp/linux-shaggy ... Browse Code »

Pull JFS update from David Kleikamp:
"Replace zero-length array in JFS"

* tag 'jfs-5.8' of git://github.com/kleikamp/linux-shaggy:
jfs: Replace zero-length array with flexible-array member

Linus Torvalds
2020-06-03 11:11:35 +0800
f3cdc8ae1 Merge tag 'for-5.8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux ... Browse Code »

Pull btrfs updates from David Sterba:
"Highlights:

- speedup dead root detection during orphan cleanup, eg. when there
are many deleted subvolumes waiting to be cleaned, the trees are
now looked up in radix tree instead of a O(N^2) search

- snapshot creation with inherited qgroup will mark the qgroup
inconsistent, requires a rescan

- send will emit file capabilities after chown, this produces a
stream that does not need postprocessing to set the capabilities
again

- direct io ported to iomap infrastructure, cleaned up and simplified
code, notably removing last use of struct buffer_head in btrfs code

Core changes:

- factor out backreference iteration, to be used by ordinary
backreferences and relocation code

- improved global block reserve utilization
* better logic to serialize requests
* increased maximum available for unlink
* improved handling on large pages (64K)

- direct io cleanups and fixes
* simplify layering, where cloned bios were unnecessarily created
for some cases
* error handling fixes (submit, endio)
* remove repair worker thread, used to avoid deadlocks during
repair

- refactored block group reading code, preparatory work for new type
of block group storage that should improve mount time on large
filesystems

Cleanups:

- cleaned up (and slightly sped up) set/get helpers for metadata data
structure members

- root bit REF_COWS got renamed to SHAREABLE to reflect the that the
blocks of the tree get shared either among subvolumes or with the
relocation trees

Fixes:

- when subvolume deletion fails due to ENOSPC, the filesystem is not
turned read-only

- device scan deals with devices from other filesystems that changed
ownership due to overwrite (mkfs)

- fix a race between scrub and block group removal/allocation

- fix long standing bug of a runaway balance operation, printing the
same line to the syslog, caused by a stale status bit on a reloc
tree that prevented progress

- fix corrupt log due to concurrent fsync of inodes with shared
extents

- fix space underflow for NODATACOW and buffered writes when it for
some reason needs to fallback to COW mode"

* tag 'for-5.8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (133 commits)
btrfs: fix space_info bytes_may_use underflow during space cache writeout
btrfs: fix space_info bytes_may_use underflow after nocow buffered write
btrfs: fix wrong file range cleanup after an error filling dealloc range
btrfs: remove redundant local variable in read_block_for_search
btrfs: open code key_search
btrfs: split btrfs_direct_IO to read and write part
btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK
fs: remove dio_end_io()
btrfs: switch to iomap_dio_rw() for dio
iomap: remove lockdep_assert_held()
iomap: add a filesystem hook for direct I/O bio submission
fs: export generic_file_buffered_read()
btrfs: turn space cache writeout failure messages into debug messages
btrfs: include error on messages about failure to write space/inode caches
btrfs: remove useless 'fail_unlock' label from btrfs_csum_file_blocks()
btrfs: do not ignore error from btrfs_next_leaf() when inserting checksums
btrfs: make checksum item extension more efficient
btrfs: fix corrupt log due to concurrent fsync of inodes with shared extents
btrfs: unexport btrfs_compress_set_level()
btrfs: simplify iget helpers
...

Linus Torvalds
2020-06-03 10:59:25 +0800
8eeae5bae Merge tag 'vfs-5.8-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux ... Browse Code »

Pull DAX updates part two from Darrick Wong:
"This time around, we're hoisting the DONTCACHE flag from XFS into the
VFS so that we can make the incore DAX mode changes become effective
sooner.

We can't change the file data access mode on a live inode because we
don't have a safe way to change the file ops pointers. The incore
state change becomes effective at inode loading time, which can happen
if the inode is evicted. Therefore, we're making it so that
filesystems can ask the VFS to evict the inode as soon as the last
holder drops.

The per-fs changes to make this call this will be in subsequent pull
requests from Ted and myself.

Summary:

- Introduce DONTCACHE flags for dentries and inodes. This hint will
cause the VFS to drop the associated objects immediately after the
last put, so that we can change the file access mode (DAX or page
cache) on the fly"

* tag 'vfs-5.8-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
fs: Introduce DCACHE_DONTCACHE
fs: Lift XFS_IDONTCACHE to the VFS layer

Linus Torvalds
2020-06-03 10:48:41 +0800
96ed320d5 Merge tag 'vfs-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux ... Browse Code »

Pull DAX updates part one from Darrick Wong:
"After many years of LKML-wrangling about how to enable programs to
query and influence the file data access mode (DAX) when a filesystem
resides on storage devices such as persistent memory, Ira Weiny has
emerged with a proposed set of standard behaviors that has not been
shot down by anyone! We're more or less standardizing on the current
XFS behavior and adapting ext4 to do the same.

This is the first of a handful pull requests that will make ext4 and
XFS present a consistent interface for user programs that care about
DAX. We add a statx attribute that programs can check to see if DAX is
enabled on a particular file. Then, we update the DAX documentation to
spell out the user-visible behaviors that filesystems will guarantee
(until the next storage industry shakeup). The on-disk inode flag has
been in XFS for a few years now.

Summary:

- Clean up io_is_direct.

- Add a new statx flag to indicate when file data access is being
done via DAX (as opposed to the page cache).

- Update the documentation for how system administrators and
application programmers can take advantage of the (still
experimental DAX) feature"

Link: https://lore.kernel.org/lkml/20200505002016.1085071-1-ira.weiny@intel.com/

* tag 'vfs-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
Documentation/dax: Update Usage section
fs/stat: Define DAX statx attribute
fs: Remove unneeded IS_DAX() check in io_is_direct()

Linus Torvalds
2020-06-03 10:45:12 +0800
16d91548d Merge tag 'xfs-5.8-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux ... Browse Code »

Pull xfs updates from Darrick Wong:
"Most of the changes this cycle are refactoring of existing code in
preparation for things landing in the future.

We also fixed various problems and deficiencies in the quota
implementation, and (I hope) the last of the stale read vectors by
forcing write allocations to go through the unwritten state until the
write completes.

Summary:

- Various cleanups to remove dead code, unnecessary conditionals,
asserts, etc.

- Fix a linker warning caused by xfs stuffing '-g' into CFLAGS
redundantly.

- Tighten up our dmesg logging to ensure that everything is prefixed
with 'XFS' for easier grepping.

- Kill a bunch of typedefs.

- Refactor the deferred ops code to reduce indirect function calls.

- Increase type-safety with the deferred ops code.

- Make the DAX mount options a tri-state.

- Fix some error handling problems in the inode flush code and clean
up other inode flush warts.

- Refactor log recovery so that each log item recovery functions now
live with the other log item processing code.

- Fix some SPDX forms.

- Fix quota counter corruption if the fs crashes after running
quotacheck but before any dquots get logged.

- Don't fail metadata verification on zero-entry attr leaf blocks,
since they're just part of the disk format now due to a historic
lack of log atomicity.

- Don't allow SWAPEXT between files with different [ugp]id when
quotas are enabled.

- Refactor inode fork reading and verification to run directly from
the inode-from-disk function. This means that we now actually
guarantee that _iget'ted inodes are totally verified and ready to
go.

- Move the incore inode fork format and extent counts to the ifork
structure.

- Scalability improvements by reducing cacheline pingponging in
struct xfs_mount.

- More scalability improvements by removing m_active_trans from the
hot path.

- Fix inode counter update sanity checking to run /only/ on debug
kernels.

- Fix longstanding inconsistency in what error code we return when a
program hits project quota limits (ENOSPC).

- Fix group quota returning the wrong error code when a program hits
group quota limits.

- Fix per-type quota limits and grace periods for group and project
quotas so that they actually work.

- Allow extension of individual grace periods.

- Refactor the non-reclaim inode radix tree walking code to remove a
bunch of stupid little functions and straighten out the
inconsistent naming schemes.

- Fix a bug in speculative preallocation where we measured a new
allocation based on the last extent mapping in the file instead of
looking farther for the last contiguous space allocation.

- Force delalloc writes to unwritten extents. This closes a stale
disk contents exposure vector if the system goes down before the
write completes.

- More lockdep whackamole"

* tag 'xfs-5.8-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (129 commits)
xfs: more lockdep whackamole with kmem_alloc*
xfs: force writes to delalloc regions to unwritten
xfs: refactor xfs_iomap_prealloc_size
xfs: measure all contiguous previous extents for prealloc size
xfs: don't fail unwritten extent conversion on writeback due to edquot
xfs: rearrange xfs_inode_walk_ag parameters
xfs: straighten out all the naming around incore inode tree walks
xfs: move xfs_inode_ag_iterator to be closer to the perag walking code
xfs: use bool for done in xfs_inode_ag_walk
xfs: fix inode ag walk predicate function return values
xfs: refactor eofb matching into a single helper
xfs: remove __xfs_icache_free_eofblocks
xfs: remove flags argument from xfs_inode_ag_walk
xfs: remove xfs_inode_ag_iterator_flags
xfs: remove unused xfs_inode_ag_iterator function
xfs: replace open-coded XFS_ICI_NO_TAG
xfs: move eofblocks conversion function to xfs_ioctl.c
xfs: allow individual quota grace period extension
xfs: per-type quota timers and warn limits
xfs: switch xfs_get_defquota to take explicit type
...

Linus Torvalds
2020-06-03 10:21:40 +0800
1ee08de1e Merge tag 'for-5.8/io_uring-2020-06-01' of git://git.kernel.dk/linux-block ... Browse Code »

Pull io_uring updates from Jens Axboe:
"A relatively quiet round, mostly just fixes and code improvements. In
particular:

- Make statx just use the generic statx handler, instead of open
coding it. We don't need that anymore, as we always call it async
safe (Bijan)

- Enable closing of the ring itself. Also fixes O_PATH closure (me)

- Properly name completion members (me)

- Batch reap of dead file registrations (me)

- Allow IORING_OP_POLL with double waitqueues (me)

- Add tee(2) support (Pavel)

- Remove double off read (Pavel)

- Fix overflow cancellations (Pavel)

- Improve CQ timeouts (Pavel)

- Async defer drain fixes (Pavel)

- Add support for enabling/disabling notifications on a registered
eventfd (Stefano)

- Remove dead state parameter (Xiaoguang)

- Disable SQPOLL submit on dying ctx (Xiaoguang)

- Various code cleanups"

* tag 'for-5.8/io_uring-2020-06-01' of git://git.kernel.dk/linux-block: (29 commits)
io_uring: fix overflowed reqs cancellation
io_uring: off timeouts based only on completions
io_uring: move timeouts flushing to a helper
statx: hide interfaces no longer used by io_uring
io_uring: call statx directly
statx: allow system call to be invoked from io_uring
io_uring: add io_statx structure
io_uring: get rid of manual punting in io_close
io_uring: separate DRAIN flushing into a cold path
io_uring: don't re-read sqe->off in timeout_prep()
io_uring: simplify io_timeout locking
io_uring: fix flush req->refs underflow
io_uring: don't submit sqes when ctx->refs is dying
io_uring: async task poll trigger cleanup
io_uring: add tee(2) support
splice: export do_tee()
io_uring: don't repeat valid flag list
io_uring: rename io_file_put()
io_uring: remove req->needs_fixed_files
io_uring: cleanup io_poll_remove_one() logic
...

Linus Torvalds
2020-06-03 06:42:50 +0800
bce159d73 Merge tag 'for-5.8/drivers-2020-06-01' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block driver updates from Jens Axboe:
"On top of the core changes, here are the block driver changes for this
merge window:

- NVMe changes:
- NVMe over Fibre Channel protocol updates, which also reach
over to drivers/scsi/lpfc (James Smart)
- namespace revalidation support on the target (Anthony
Iliopoulos)
- gcc zero length array fix (Arnd Bergmann)
- nvmet cleanups (Chaitanya Kulkarni)
- misc cleanups and fixes (me, Keith Busch, Sagi Grimberg)
- use a SRQ per completion vector (Max Gurtovoy)
- fix handling of runtime changes to the queue count (Weiping
Zhang)
- t10 protection information support for nvme-rdma and
nvmet-rdma (Israel Rukshin and Max Gurtovoy)
- target side AEN improvements (Chaitanya Kulkarni)
- various fixes and minor improvements all over, icluding the
nvme part of the lpfc driver"

- Floppy code cleanup series (Willy, Denis)

- Floppy contention fix (Jiri)

- Loop CONFIGURE support (Martijn)

- bcache fixes/improvements (Coly, Joe, Colin)

- q->queuedata cleanups (Christoph)

- Get rid of ioctl_by_bdev (Christoph, Stefan)

- md/raid5 allocation fixes (Coly)

- zero length array fixes (Gustavo)

- swim3 task state fix (Xu)"

* tag 'for-5.8/drivers-2020-06-01' of git://git.kernel.dk/linux-block: (166 commits)
bcache: configure the asynchronous registertion to be experimental
bcache: asynchronous devices registration
bcache: fix refcount underflow in bcache_device_free()
bcache: Convert pr_ uses to a more typical style
bcache: remove redundant variables i and n
lpfc: Fix return value in __lpfc_nvme_ls_abort
lpfc: fix axchg pointer reference after free and double frees
lpfc: Fix pointer checks and comments in LS receive refactoring
nvme: set dma alignment to qword
nvmet: cleanups the loop in nvmet_async_events_process
nvmet: fix memory leak when removing namespaces and controllers concurrently
nvmet-rdma: add metadata/T10-PI support
nvmet: add metadata support for block devices
nvmet: add metadata/T10-PI support
nvme: add Metadata Capabilities enumerations
nvmet: rename nvmet_check_data_len to nvmet_check_transfer_len
nvmet: rename nvmet_rw_len to nvmet_rw_data_len
nvmet: add metadata characteristics for a namespace
nvme-rdma: add metadata/T10-PI support
nvme-rdma: introduce nvme_rdma_sgl structure
...

Linus Torvalds
2020-06-03 06:37:03 +0800
750a02ab8 Merge tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block updates from Jens Axboe:
"Core block changes that have been queued up for this release:

- Remove dead blk-throttle and blk-wbt code (Guoqing)

- Include pid in blktrace note traces (Jan)

- Don't spew I/O errors on wouldblock termination (me)

- Zone append addition (Johannes, Keith, Damien)

- IO accounting improvements (Konstantin, Christoph)

- blk-mq hardware map update improvements (Ming)

- Scheduler dispatch improvement (Salman)

- Inline block encryption support (Satya)

- Request map fixes and improvements (Weiping)

- blk-iocost tweaks (Tejun)

- Fix for timeout failing with error injection (Keith)

- Queue re-run fixes (Douglas)

- CPU hotplug improvements (Christoph)

- Queue entry/exit improvements (Christoph)

- Move DMA drain handling to the few drivers that use it (Christoph)

- Partition handling cleanups (Christoph)"

* tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-block: (127 commits)
block: mark bio_wouldblock_error() bio with BIO_QUIET
blk-wbt: rename __wbt_update_limits to wbt_update_limits
blk-wbt: remove wbt_update_limits
blk-throttle: remove tg_drain_bios
blk-throttle: remove blk_throtl_drain
null_blk: force complete for timeout request
blk-mq: drain I/O when all CPUs in a hctx are offline
blk-mq: add blk_mq_all_tag_iter
blk-mq: open code __blk_mq_alloc_request in blk_mq_alloc_request_hctx
blk-mq: use BLK_MQ_NO_TAG in more places
blk-mq: rename BLK_MQ_TAG_FAIL to BLK_MQ_NO_TAG
blk-mq: move more request initialization to blk_mq_rq_ctx_init
blk-mq: simplify the blk_mq_get_request calling convention
blk-mq: remove the bio argument to ->prepare_request
nvme: force complete cancelled requests
blk-mq: blk-mq: provide forced completion method
block: fix a warning when blkdev.h is included for !CONFIG_BLOCK builds
block: blk-crypto-fallback: remove redundant initialization of variable err
block: reduce part_stat_lock() scope
block: use __this_cpu_add() instead of access by smp_processor_id()
...

Linus Torvalds
2020-06-03 06:29:19 +0800
355ba37d7 Merge tag 'pm-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

Pull power management updates from Rafael Wysocki:
"These rework the system-wide PM driver flags, make runtime switching
of cpuidle governors easier, improve the user space hibernation
interface code, add intel-speed-select interface documentation, add
more debug messages to the ACPI code handling suspend to idle, update
the cpufreq core and drivers, fix a minor issue in the cpuidle core
and update two cpuidle drivers, improve the PM-runtime framework,
update the Intel RAPL power capping driver, update devfreq core and
drivers, and clean up the cpupower utility.

Specifics:

- Rework the system-wide PM driver flags to make them easier to
understand and use and update their documentation (Rafael Wysocki,
Alan Stern).

- Allow cpuidle governors to be switched at run time regardless of
the kernel configuration and update the related documentation
accordingly (Hanjun Guo).

- Improve the resume device handling in the user space hibernarion
interface code (Domenico Andreoli).

- Document the intel-speed-select sysfs interface (Srinivas
Pandruvada).

- Make the ACPI code handing suspend to idle print more debug
messages to help diagnose issues with it (Rafael Wysocki).

- Fix a helper routine in the cpufreq core and correct a typo in the
struct cpufreq_driver kerneldoc comment (Rafael Wysocki, Wang
Wenhu).

- Update cpufreq drivers:

- Make the intel_pstate driver start in the passive mode by
default on systems without HWP (Rafael Wysocki).

- Add i.MX7ULP support to the imx-cpufreq-dt driver and add
i.MX7ULP to the cpufreq-dt-platdev blacklist (Peng Fan).

- Convert the qoriq cpufreq driver to a platform one, make the
platform code create a suitable device object for it and add
platform dependencies to it (Mian Yousaf Kaukab, Geert
Uytterhoeven).

- Fix wrong compatible binding in the qcom driver (Ansuel Smith).

- Build the omap driver by default for ARCH_OMAP2PLUS (Anders
Roxell).

- Add r8a7742 SoC support to the dt cpufreq driver (Lad
Prabhakar).

- Update cpuidle core and drivers:

- Fix three reference count leaks in error code paths in the
cpuidle core (Qiushi Wu).

- Convert Qualcomm SPM to a generic cpuidle driver (Stephan
Gerhold).

- Fix up the execution order when entering a domain idle state in
the PSCI driver (Ulf Hansson).

- Fix a reference counting issue related to clock management and
clean up two oddities in the PM-runtime framework (Rafael Wysocki,
Andy Shevchenko).

- Add ElkhartLake support to the Intel RAPL power capping driver and
remove an unused local MSR definition from it (Jacob Pan, Sumeet
Pawnikar).

- Update devfreq core and drivers:

- Replace strncpy() with strscpy() in the devfreq core and use
lockdep asserts instead of manual checks for a locked mutex in
it (Dmitry Osipenko, Krzysztof Kozlowski).

- Add a generic imx bus scaling driver and make it register an
interconnect device (Leonard Crestez, Gustavo A. R. Silva).

- Make the cpufreq notifier in the tegra30 driver take boosting
into account and delete an unuseful error message from that
driver (Dmitry Osipenko, Markus Elfring).

- Remove unneeded semicolon from the cpupower code (Zou Wei)"

* tag 'pm-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (51 commits)
cpuidle: Fix three reference count leaks
PM: runtime: Replace pm_runtime_callbacks_present()
PM / devfreq: Use lockdep asserts instead of manual checks for locked mutex
PM / devfreq: imx-bus: Fix inconsistent IS_ERR and PTR_ERR
PM / devfreq: Replace strncpy with strscpy
PM / devfreq: imx: Register interconnect device
PM / devfreq: Add generic imx bus scaling driver
PM / devfreq: tegra30: Delete an error message in tegra_devfreq_probe()
PM / devfreq: tegra30: Make CPUFreq notifier to take into account boosting
PM: hibernate: Restrict writes to the resume device
PM: runtime: clk: Fix clk_pm_runtime_get() error path
cpuidle: Convert Qualcomm SPM driver to a generic CPUidle driver
ACPI: EC: PM: s2idle: Extend GPE dispatching debug message
ACPI: PM: s2idle: Print type of wakeup debug messages
powercap: RAPL: remove unused local MSR define
PM: runtime: Make clear what we do when conditions are wrong in rpm_suspend()
Documentation: admin-guide: pm: Document intel-speed-select
PM: hibernate: Split off snapshot dev option
PM: hibernate: Incorporate concurrency handling
Documentation: ABI: make current_governer_ro as a candidate for removal
...

Linus Torvalds
2020-06-03 04:17:23 +0800
94709049f Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge updates from Andrew Morton:
"A few little subsystems and a start of a lot of MM patches.

Subsystems affected by this patch series: squashfs, ocfs2, parisc,
vfs. With mm subsystems: slab-generic, slub, debug, pagecache, gup,
swap, memcg, pagemap, memory-failure, vmalloc, kasan"

* emailed patches from Andrew Morton : (128 commits)
kasan: move kasan_report() into report.c
mm/mm_init.c: report kasan-tag information stored in page->flags
ubsan: entirely disable alignment checks under UBSAN_TRAP
kasan: fix clang compilation warning due to stack protector
x86/mm: remove vmalloc faulting
mm: remove vmalloc_sync_(un)mappings()
x86/mm/32: implement arch_sync_kernel_mappings()
x86/mm/64: implement arch_sync_kernel_mappings()
mm/ioremap: track which page-table levels were modified
mm/vmalloc: track which page-table levels were modified
mm: add functions to track page directory modifications
s390: use __vmalloc_node in stack_alloc
powerpc: use __vmalloc_node in alloc_vm_stack
arm64: use __vmalloc_node in arch_alloc_vmap_stack
mm: remove vmalloc_user_node_flags
mm: switch the test_vmalloc module to use __vmalloc_node
mm: remove __vmalloc_node_flags_caller
mm: remove both instances of __vmalloc_node_flags
mm: remove the prot argument to __vmalloc_node
mm: remove the pgprot argument to __vmalloc
...

Linus Torvalds
2020-06-03 03:21:36 +0800
88dca4ca5 mm: remove the pgprot argument to __vmalloc ... Browse Code »

The pgprot argument to __vmalloc is always PAGE_KERNEL now, so remove it.

Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Reviewed-by: Michael Kelley [hyperv]
Acked-by: Gao Xiang [erofs]
Acked-by: Peter Zijlstra (Intel)
Acked-by: Wei Liu
Cc: Christian Borntraeger
Cc: Christophe Leroy
Cc: Daniel Vetter
Cc: David Airlie
Cc: Greg Kroah-Hartman
Cc: Haiyang Zhang
Cc: Johannes Weiner
Cc: "K. Y. Srinivasan"
Cc: Laura Abbott
Cc: Mark Rutland
Cc: Minchan Kim
Cc: Nitin Gupta
Cc: Robin Murphy
Cc: Sakari Ailus
Cc: Stephen Hemminger
Cc: Sumit Semwal
Cc: Benjamin Herrenschmidt
Cc: Catalin Marinas
Cc: Heiko Carstens
Cc: Paul Mackerras
Cc: Vasily Gorbik
Cc: Will Deacon
Link: http://lkml.kernel.org/r/20200414131348.444715-22-hch@lst.de
Signed-off-by: Linus Torvalds

Christoph Hellwig
2020-06-03 01:59:11 +0800
d4efd79a8 mm: remove the prot argument from vm_map_ram ... Browse Code »

This is always PAGE_KERNEL - for long term mappings with other properties
vmap should be used.

Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Acked-by: Peter Zijlstra (Intel)
Cc: Christian Borntraeger
Cc: Christophe Leroy
Cc: Daniel Vetter
Cc: David Airlie
Cc: Gao Xiang
Cc: Greg Kroah-Hartman
Cc: Haiyang Zhang
Cc: Johannes Weiner
Cc: "K. Y. Srinivasan"
Cc: Laura Abbott
Cc: Mark Rutland
Cc: Michael Kelley
Cc: Minchan Kim
Cc: Nitin Gupta
Cc: Robin Murphy
Cc: Sakari Ailus
Cc: Stephen Hemminger
Cc: Sumit Semwal
Cc: Wei Liu
Cc: Benjamin Herrenschmidt
Cc: Catalin Marinas
Cc: Heiko Carstens
Cc: Paul Mackerras
Cc: Vasily Gorbik
Cc: Will Deacon
Link: http://lkml.kernel.org/r/20200414131348.444715-19-hch@lst.de
Signed-off-by: Linus Torvalds

Christoph Hellwig
2020-06-03 01:59:11 +0800
c94b6923f /proc/PID/smaps: Add PMD migration entry parsing ... Browse Code »

Now, when reading /proc/PID/smaps, the PMD migration entry in page table
is simply ignored. To improve the accuracy of /proc/PID/smaps, its
parsing and processing is added.

To test the patch, we run pmbench to eat 400 MB memory in background,
then run /usr/bin/migratepages and `cat /proc/PID/smaps` every second.
The issue as follows can be reproduced within 60 seconds.

Before the patch, for the fully populated 400 MB anonymous VMA, some THP
pages under migration may be lost as below.

7f3f6a7e5000-7f3f837e5000 rw-p 00000000 00:00 0
Size: 409600 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 407552 kB
Pss: 407552 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 407552 kB
Referenced: 301056 kB
Anonymous: 407552 kB
LazyFree: 0 kB
AnonHugePages: 405504 kB
ShmemPmdMapped: 0 kB
FilePmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 0 kB
THPeligible: 1
VmFlags: rd wr mr mw me ac

After the patch, it will be always,

7f3f6a7e5000-7f3f837e5000 rw-p 00000000 00:00 0
Size: 409600 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 409600 kB
Pss: 409600 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 409600 kB
Referenced: 294912 kB
Anonymous: 409600 kB
LazyFree: 0 kB
AnonHugePages: 407552 kB
ShmemPmdMapped: 0 kB
FilePmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 0 kB
THPeligible: 1
VmFlags: rd wr mr mw me ac

Signed-off-by: "Huang, Ying"
Signed-off-by: Andrew Morton
Reviewed-by: Zi Yan
Acked-by: Michal Hocko
Acked-by: Kirill A. Shutemov
Acked-by: Vlastimil Babka
Cc: Andrea Arcangeli
Cc: Alexey Dobriyan
Cc: Konstantin Khlebnikov
Cc: "Jérôme Glisse"
Cc: Yang Shi
Link: http://lkml.kernel.org/r/20200403123059.1846960-1-ying.huang@intel.com
Signed-off-by: Linus Torvalds

Huang Ying
2020-06-03 01:59:10 +0800
8d92890bd mm/writeback: discard NR_UNSTABLE_NFS, use NR_WRITEBACK instead ... Browse Code »

After an NFS page has been written it is considered "unstable" until a
COMMIT request succeeds. If the COMMIT fails, the page will be
re-written.

These "unstable" pages are currently accounted as "reclaimable", either
in WB_RECLAIMABLE, or in NR_UNSTABLE_NFS which is included in a
'reclaimable' count. This might have made sense when sending the COMMIT
required a separate action by the VFS/MM (e.g. releasepage() used to
send a COMMIT). However now that all writes generated by ->writepages()
will automatically be followed by a COMMIT (since commit 919e3bd9a875
("NFS: Ensure we commit after writeback is complete")) it makes more
sense to treat them as writeback pages.

So this patch removes NR_UNSTABLE_NFS and accounts unstable pages in
NR_WRITEBACK and WB_WRITEBACK.

A particular effect of this change is that when
wb_check_background_flush() calls wb_over_bg_threshold(), the latter
will report 'true' a lot less often as the 'unstable' pages are no
longer considered 'dirty' (as there is nothing that writeback can do
about them anyway).

Currently wb_check_background_flush() will trigger writeback to NFS even
when there are relatively few dirty pages (if there are lots of unstable
pages), this can result in small writes going to the server (10s of
Kilobytes rather than a Megabyte) which hurts throughput. With this
patch, there are fewer writes which are each larger on average.

Where the NR_UNSTABLE_NFS count was included in statistics
virtual-files, the entry is retained, but the value is hard-coded as
zero. static trace points and warning printks which mentioned this
counter no longer report it.

[akpm@linux-foundation.org: re-layout comment]
[akpm@linux-foundation.org: fix printk warning]
Signed-off-by: NeilBrown
Signed-off-by: Andrew Morton
Reviewed-by: Jan Kara
Reviewed-by: Christoph Hellwig
Acked-by: Trond Myklebust
Acked-by: Michal Hocko [mm]
Cc: Christoph Hellwig
Cc: Chuck Lever
Link: http://lkml.kernel.org/r/87d06j7gqa.fsf@notabene.neil.brown.name
Signed-off-by: Linus Torvalds

NeilBrown
2020-06-03 01:59:08 +0800
a37b0715d mm/writeback: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE ... Browse Code »

PF_LESS_THROTTLE exists for loop-back nfsd (and a similar need in the
loop block driver and callers of prctl(PR_SET_IO_FLUSHER)), where a
daemon needs to write to one bdi (the final bdi) in order to free up
writes queued to another bdi (the client bdi).

The daemon sets PF_LESS_THROTTLE and gets a larger allowance of dirty
pages, so that it can still dirty pages after other processses have been
throttled. The purpose of this is to avoid deadlock that happen when
the PF_LESS_THROTTLE process must write for any dirty pages to be freed,
but it is being thottled and cannot write.

This approach was designed when all threads were blocked equally,
independently on which device they were writing to, or how fast it was.
Since that time the writeback algorithm has changed substantially with
different threads getting different allowances based on non-trivial
heuristics. This means the simple "add 25%" heuristic is no longer
reliable.

The important issue is not that the daemon needs a *larger* dirty page
allowance, but that it needs a *private* dirty page allowance, so that
dirty pages for the "client" bdi that it is helping to clear (the bdi
for an NFS filesystem or loop block device etc) do not affect the
throttling of the daemon writing to the "final" bdi.

This patch changes the heuristic so that the task is not throttled when
the bdi it is writing to has a dirty page count below below (or equal
to) the free-run threshold for that bdi. This ensures it will always be
able to have some pages in flight, and so will not deadlock.

In a steady-state, it is expected that PF_LOCAL_THROTTLE tasks might
still be throttled by global threshold, but that is acceptable as it is
only the deadlock state that is interesting for this flag.

This approach of "only throttle when target bdi is busy" is consistent
with the other use of PF_LESS_THROTTLE in current_may_throttle(), were
it causes attention to be focussed only on the target bdi.

So this patch
- renames PF_LESS_THROTTLE to PF_LOCAL_THROTTLE,
- removes the 25% bonus that that flag gives, and
- If PF_LOCAL_THROTTLE is set, don't delay at all unless the
global and the local free-run thresholds are exceeded.

Note that previously realtime threads were treated the same as
PF_LESS_THROTTLE threads. This patch does *not* change the behvaiour
for real-time threads, so it is now different from the behaviour of nfsd
and loop tasks. I don't know what is wanted for realtime.

[akpm@linux-foundation.org: coding style fixes]
Signed-off-by: NeilBrown
Signed-off-by: Andrew Morton
Reviewed-by: Jan Kara
Acked-by: Chuck Lever [nfsd]
Cc: Christoph Hellwig
Cc: Michal Hocko
Cc: Trond Myklebust
Link: http://lkml.kernel.org/r/87ftbf7gs3.fsf@notabene.neil.brown.name
Signed-off-by: Linus Torvalds

NeilBrown
2020-06-03 01:59:08 +0800
4c42be38c orangefs: use attach/detach_page_private ... Browse Code »

Since the new pair function is introduced, we can call them to clean the
code in orangefs.

Signed-off-by: Guoqing Jiang
Signed-off-by: Andrew Morton
Tested-by: Mike Marshall
Reviewed-by: Andrew Morton
Cc: Martin Brandenburg
Link: http://lkml.kernel.org/r/20200517214718.468-9-guoqing.jiang@cloud.ionos.com
Signed-off-by: Linus Torvalds

Guoqing Jiang
2020-06-03 01:59:08 +0800
14ed109e3 ntfs: replace attach_page_buffers with attach_page_private ... Browse Code »

Call the new function since attach_page_buffers will be removed.

Signed-off-by: Guoqing Jiang
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Cc: Anton Altaparmakov
Link: http://lkml.kernel.org/r/20200517214718.468-8-guoqing.jiang@cloud.ionos.com
Signed-off-by: Linus Torvalds

Guoqing Jiang
2020-06-03 01:59:07 +0800
58aeb7319 iomap: use attach/detach_page_private ... Browse Code »

Since the new pair function is introduced, we can call them to clean the
code in iomap.

Signed-off-by: Guoqing Jiang
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Reviewed-by: Darrick J. Wong
Cc: Christoph Hellwig
Cc: Dave Chinner
Link: http://lkml.kernel.org/r/20200517214718.468-7-guoqing.jiang@cloud.ionos.com
Signed-off-by: Linus Torvalds

Guoqing Jiang
2020-06-03 01:59:07 +0800
7128cf9a2 f2fs: use attach/detach_page_private ... Browse Code »

Since the new pair function is introduced, we can call them to clean the
code in f2fs.h.

Signed-off-by: Guoqing Jiang
Signed-off-by: Andrew Morton
Acked-by: Chao Yu
Cc: Jaegeuk Kim
Link: http://lkml.kernel.org/r/20200517214718.468-6-guoqing.jiang@cloud.ionos.com
Signed-off-by: Linus Torvalds

Guoqing Jiang
2020-06-03 01:59:07 +0800