Eric Lee / smarc-fsl-linux-kernel

30 Dec, 2020

1 commit

5e78c6bd9 inotify: convert to handle_inode_event() interface ... Browse Code »

commit 1a2620a99803ad660edc5d22fd9c66cce91ceb1c upstream.

Convert inotify to use the simple handle_inode_event() interface to
get rid of the code duplication between the generic helper
fsnotify_handle_event() and the inotify_handle_event() callback, which
also happen to be buggy code.

The bug will be fixed in the generic helper.

Link: https://lore.kernel.org/r/20201202120713.702387-3-amir73il@gmail.com
CC: stable@vger.kernel.org
Fixes: b9a1b9772509 ("fsnotify: create method handle_inode_event() in fsnotify_operations")
Signed-off-by: Amir Goldstein
Signed-off-by: Jan Kara
Signed-off-by: Greg Kroah-Hartman

Amir Goldstein
2020-12-30 18:54:18 +0800

19 Oct, 2020

1 commit

b87d8cefe mm, memcg: rework remote charging API to support nesting ... Browse Code »

Currently the remote memcg charging API consists of two functions:
memalloc_use_memcg() and memalloc_unuse_memcg(), which set and clear the
memcg value, which overwrites the memcg of the current task.

memalloc_use_memcg(target_memcg);

memalloc_unuse_memcg();

It works perfectly for allocations performed from a normal context,
however an attempt to call it from an interrupt context or just nest two
remote charging blocks will lead to an incorrect accounting. On exit from
the inner block the active memcg will be cleared instead of being
restored.

memalloc_use_memcg(target_memcg);

memalloc_use_memcg(target_memcg_2);

memalloc_unuse_memcg();

Error: allocation here are charged to the memcg of the current
process instead of target_memcg.

memalloc_unuse_memcg();

This patch extends the remote charging API by switching to a single
function: struct mem_cgroup *set_active_memcg(struct mem_cgroup *memcg),
which sets the new value and returns the old one. So a remote charging
block will look like:

old_memcg = set_active_memcg(target_memcg);

set_active_memcg(old_memcg);

This patch is heavily based on the patch by Johannes Weiner, which can be
found here: https://lkml.org/lkml/2020/5/28/806 .

Signed-off-by: Roman Gushchin
Signed-off-by: Andrew Morton
Reviewed-by: Shakeel Butt
Cc: Johannes Weiner
Cc: Dan Schatzberg
Link: https://lkml.kernel.org/r/20200821212056.3769116-1-guro@fb.com
Signed-off-by: Linus Torvalds

Roman Gushchin
2020-10-19 00:27:09 +0800

28 Jul, 2020

3 commits

957f7b472 inotify: do not set FS_EVENT_ON_CHILD in non-dir mark mask ... Browse Code »

FS_EVENT_ON_CHILD has currently no meaning for non-dir inode marks. In
the following patches we want to use that bit to mean that mark's
notification group cares about parent and name information. So stop
setting FS_EVENT_ON_CHILD for non-dir marks.

Link: https://lore.kernel.org/r/20200722125849.17418-3-amir73il@gmail.com
Signed-off-by: Amir Goldstein
Signed-off-by: Jan Kara

Amir Goldstein
2020-07-28 05:16:16 +0800
c8f3446c6 inotify: report both events on parent and child with single callback ... Browse Code »

fsnotify usually calls inotify_handle_event() once for watching parent
to report event with child's name and once for watching child to report
event without child's name.

Do the same thing with a single callback instead of two callbacks when
marks iterator contains both inode and child entries.

Link: https://lore.kernel.org/r/20200716084230.30611-13-amir73il@gmail.com
Signed-off-by: Amir Goldstein
Signed-off-by: Jan Kara

Amir Goldstein
2020-07-28 03:24:51 +0800
b54cecf5e fsnotify: pass dir argument to handle_event() callback ... Browse Code »

The 'inode' argument to handle_event(), sometimes referred to as
'to_tell' is somewhat obsolete.
It is a remnant from the times when a group could only have an inode mark
associated with an event.

We now pass an iter_info array to the callback, with all marks associated
with an event.

Most backends ignore this argument, with two exceptions:
1. dnotify uses it for sanity check that event is on directory
2. fanotify uses it to report fid of directory on directory entry
modification events

Remove the 'inode' argument and add a 'dir' argument.
The callback function signature is deliberately changed, because
the meaning of the argument has changed and the arguments have
been documented.

The 'dir' argument is set to when 'file_name' is specified and it is
referring to the directory that the 'file_name' entry belongs to.

Signed-off-by: Amir Goldstein
Signed-off-by: Jan Kara

Amir Goldstein
2020-07-28 00:32:47 +0800

15 Jul, 2020

1 commit

956235afd inotify: do not use objectid when comparing events ... Browse Code »

inotify's event->wd is the object identifier.
Compare that instead of the common fsnotidy event objectid, so
we can get rid of the objectid field later.

Link: https://lore.kernel.org/r/20200708111156.24659-6-amir73il@gmail.com
Signed-off-by: Amir Goldstein
Signed-off-by: Jan Kara

Amir Goldstein
2020-07-15 23:36:58 +0800

14 Jun, 2020

1 commit

a7f7f6248 treewide: replace '---help---' in Kconfig files with 'help' ... Browse Code »

Since commit 84af7a6194e4 ("checkpatch: kconfig: prefer 'help' over
'---help---'"), the number of '---help---' has been gradually
decreasing, but there are still more than 2400 instances.

This commit finishes the conversion. While I touched the lines,
I also fixed the indentation.

There are a variety of indentation styles found.

a) 4 spaces + '---help---'
b) 7 spaces + '---help---'
c) 8 spaces + '---help---'
d) 1 space + 1 tab + '---help---'
e) 1 tab + '---help---' (correct indentation)
f) 1 tab + 1 space + '---help---'
g) 1 tab + 2 spaces + '---help---'

In order to convert all of them to 1 tab + 'help', I ran the
following commend:

$ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'

Signed-off-by: Masahiro Yamada

Masahiro Yamada
2020-06-14 00:57:21 +0800

05 Jun, 2020

1 commit

07c8f3bfe Merge tag 'fsnotify_for_v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs ... Browse Code »

Pull fsnotify updates from Jan Kara:
"Several smaller fixes and cleanups for fsnotify subsystem"

* tag 'fsnotify_for_v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify: fix ignore mask logic for events on child and on dir
fanotify: don't write with size under sizeof(response)
fsnotify: Remove proc_fs.h include
fanotify: remove reference to fill_event_metadata()
fsnotify: add mutex destroy
fanotify: prefix should_merge()
fanotify: Replace zero-length array with flexible-array
inotify: Fix error return code assignment flow.
fsnotify: Add missing annotation for fsnotify_finish_user_wait() and for fsnotify_prepare_user_wait()

Linus Torvalds
2020-06-05 04:51:54 +0800

27 Apr, 2020

1 commit

7b26aa243 inotify: Fix error return code assignment flow. ... Browse Code »

If error code is initialized -EINVAL, there is no need to assign -EINVAL.

Link: https://lore.kernel.org/r/20200426143316.29877-1-her0gyugyu@gmail.com
Signed-off-by: youngjun
Signed-off-by: Jan Kara

youngjun
2020-04-27 18:14:56 +0800

21 Apr, 2020

1 commit

0c1bc6b84 docs: filesystems: fix renamed references ... Browse Code »

Some filesystem references got broken by a previous patch
series I submitted. Address those.

Signed-off-by: Mauro Carvalho Chehab
Acked-by: David Sterba # fs/affs/Kconfig
Link: https://lore.kernel.org/r/57318c53008dbda7f6f4a5a9e5787f4d37e8565a.1586881715.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet

Mauro Carvalho Chehab
2020-04-21 05:45:22 +0800

24 Mar, 2020

2 commits

dfc2d2594 fsnotify: replace inode pointer with an object id ... Browse Code »

The event inode field is used only for comparison in queue merges and
cannot be dereferenced after handle_event(), because it does not hold a
refcount on the inode.

Replace it with an abstract id to do the same thing.

Link: https://lore.kernel.org/r/20200319151022.31456-8-amir73il@gmail.com
Signed-off-by: Amir Goldstein
Signed-off-by: Jan Kara

Amir Goldstein
2020-03-24 18:28:00 +0800
aa93bdc55 fsnotify: use helpers to access data by data_type ... Browse Code »

Create helpers to access path and inode from different data types.

Link: https://lore.kernel.org/r/20200319151022.31456-5-amir73il@gmail.com
Signed-off-by: Amir Goldstein
Signed-off-by: Jan Kara

Amir Goldstein
2020-03-24 01:19:06 +0800

24 Sep, 2019

1 commit

5825a95fe Merge tag 'selinux-pr-20190917' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux ... Browse Code »

Pull selinux updates from Paul Moore:

- Add LSM hooks, and SELinux access control hooks, for dnotify,
fanotify, and inotify watches. This has been discussed with both the
LSM and fs/notify folks and everybody is good with these new hooks.

- The LSM stacking changes missed a few calls to current_security() in
the SELinux code; we fix those and remove current_security() for
good.

- Improve our network object labeling cache so that we always return
the object's label, even when under memory pressure. Previously we
would return an error if we couldn't allocate a new cache entry, now
we always return the label even if we can't create a new cache entry
for it.

- Convert the sidtab atomic_t counter to a normal u32 with
READ/WRITE_ONCE() and memory barrier protection.

- A few patches to policydb.c to clean things up (remove forward
declarations, long lines, bad variable names, etc)

* tag 'selinux-pr-20190917' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
lsm: remove current_security()
selinux: fix residual uses of current_security() for the SELinux blob
selinux: avoid atomic_t usage in sidtab
fanotify, inotify, dnotify, security: add security hook for fs notifications
selinux: always return a secid from the network caches if we find one
selinux: policydb - rename type_val_to_struct_array
selinux: policydb - fix some checkpatch.pl warnings
selinux: shuffle around policydb.c to get rid of forward declarations

Linus Torvalds
2019-09-24 02:21:04 +0800

13 Aug, 2019

1 commit

ac5656d8a fanotify, inotify, dnotify, security: add security hook for fs notifications ... Browse Code »

As of now, setting watches on filesystem objects has, at most, applied a
check for read access to the inode, and in the case of fanotify, requires
CAP_SYS_ADMIN. No specific security hook or permission check has been
provided to control the setting of watches. Using any of inotify, dnotify,
or fanotify, it is possible to observe, not only write-like operations, but
even read access to a file. Modeling the watch as being merely a read from
the file is insufficient for the needs of SELinux. This is due to the fact
that read access should not necessarily imply access to information about
when another process reads from a file. Furthermore, fanotify watches grant
more power to an application in the form of permission events. While
notification events are solely, unidirectional (i.e. they only pass
information to the receiving application), permission events are blocking.
Permission events make a request to the receiving application which will
then reply with a decision as to whether or not that action may be
completed. This causes the issue of the watching application having the
ability to exercise control over the triggering process. Without drawing a
distinction within the permission check, the ability to read would imply
the greater ability to control an application. Additionally, mount and
superblock watches apply to all files within the same mount or superblock.
Read access to one file should not necessarily imply the ability to watch
all files accessed within a given mount or superblock.

In order to solve these issues, a new LSM hook is implemented and has been
placed within the system calls for marking filesystem objects with inotify,
fanotify, and dnotify watches. These calls to the hook are placed at the
point at which the target path has been resolved and are provided with the
path struct, the mask of requested notification events, and the type of
object on which the mark is being set (inode, superblock, or mount). The
mask and obj_type have already been translated into common FS_* values
shared by the entirety of the fs notification infrastructure. The path
struct is passed rather than just the inode so that the mount is available,
particularly for mount watches. This also allows for use of the hook by
pathname-based security modules. However, since the hook is intended for
use even by inode based security modules, it is not placed under the
CONFIG_SECURITY_PATH conditional. Otherwise, the inode-based security
modules would need to enable all of the path hooks, even though they do not
use any of them.

This only provides a hook at the point of setting a watch, and presumes
that permission to set a particular watch implies the ability to receive
all notification about that object which match the mask. This is all that
is required for SELinux. If other security modules require additional hooks
or infrastructure to control delivery of notification, these can be added
by them. It does not make sense for us to propose hooks for which we have
no implementation. The understanding that all notifications received by the
requesting application are all strictly of a type for which the application
has been granted permission shows that this implementation is sufficient in
its coverage.

Security modules wishing to provide complete control over fanotify must
also implement a security_file_open hook that validates that the access
requested by the watching application is authorized. Fanotify has the issue
that it returns a file descriptor with the file mode specified during
fanotify_init() to the watching process on event. This is already covered
by the LSM security_file_open hook if the security module implements
checking of the requested file mode there. Otherwise, a watching process
can obtain escalated access to a file for which it has not been authorized.

The selinux_path_notify hook implementation works by adding five new file
permissions: watch, watch_mount, watch_sb, watch_reads, and watch_with_perm
(descriptions about which will follow), and one new filesystem permission:
watch (which is applied to superblock checks). The hook then decides which
subset of these permissions must be held by the requesting application
based on the contents of the provided mask and the obj_type. The
selinux_file_open hook already checks the requested file mode and therefore
ensures that a watching process cannot escalate its access through
fanotify.

The watch, watch_mount, and watch_sb permissions are the baseline
permissions for setting a watch on an object and each are a requirement for
any watch to be set on a file, mount, or superblock respectively. It should
be noted that having either of the other two permissions (watch_reads and
watch_with_perm) does not imply the watch, watch_mount, or watch_sb
permission. Superblock watches further require the filesystem watch
permission to the superblock. As there is no labeled object in view for
mounts, there is no specific check for mount watches beyond watch_mount to
the inode. Such a check could be added in the future, if a suitable labeled
object existed representing the mount.

The watch_reads permission is required to receive notifications from
read-exclusive events on filesystem objects. These events include accessing
a file for the purpose of reading and closing a file which has been opened
read-only. This distinction has been drawn in order to provide a direct
indication in the policy for this otherwise not obvious capability. Read
access to a file should not necessarily imply the ability to observe read
events on a file.

Finally, watch_with_perm only applies to fanotify masks since it is the
only way to set a mask which allows for the blocking, permission event.
This permission is needed for any watch which is of this type. Though
fanotify requires CAP_SYS_ADMIN, this is insufficient as it gives implicit
trust to root, which we do not do, and does not support least privilege.

Signed-off-by: Aaron Goidel
Acked-by: Casey Schaufler
Acked-by: Jan Kara
Signed-off-by: Paul Moore

Aaron Goidel
2019-08-13 05:45:39 +0800

19 Jul, 2019

1 commit

eec4844fa proc/sysctl: add shared variables for range check ... Browse Code »

In the sysctl code the proc_dointvec_minmax() function is often used to
validate the user supplied value between an allowed range. This
function uses the extra1 and extra2 members from struct ctl_table as
minimum and maximum allowed value.

On sysctl handler declaration, in every source file there are some
readonly variables containing just an integer which address is assigned
to the extra1 and extra2 members, so the sysctl range is enforced.

The special values 0, 1 and INT_MAX are very often used as range
boundary, leading duplication of variables like zero=0, one=1,
int_max=INT_MAX in different source files:

$ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l
248

Add a const int array containing the most commonly used values, some
macros to refer more easily to the correct array member, and use them
instead of creating a local one for every object file.

This is the bloat-o-meter output comparing the old and new binary
compiled with the default Fedora config:

# scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o
add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164)
Data old new delta
sysctl_vals - 12 +12
__kstrtab_sysctl_vals - 12 +12
max 14 10 -4
int_max 16 - -16
one 68 - -68
zero 128 28 -100
Total: Before=20583249, After=20583085, chg -0.00%

[mcroce@redhat.com: tipc: remove two unused variables]
Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com
[akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c]
[arnd@arndb.de: proc/sysctl: make firmware loader table conditional]
Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de
[akpm@linux-foundation.org: fix fs/eventpoll.c]
Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com
Signed-off-by: Matteo Croce
Signed-off-by: Arnd Bergmann
Acked-by: Kees Cook
Reviewed-by: Aaron Tomlin
Cc: Matthew Wilcox
Cc: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matteo Croce
2019-07-19 08:08:07 +0800

13 Jul, 2019

1 commit

ec1654509 memcg, fsnotify: no oom-kill for remote memcg charging ... Browse Code »

Commit d46eb14b735b ("fs: fsnotify: account fsnotify metadata to
kmemcg") added remote memcg charging for fanotify and inotify event
objects. The aim was to charge the memory to the listener who is
interested in the events but without triggering the OOM killer.
Otherwise there would be security concerns for the listener.

At the time, oom-kill trigger was not in the charging path. A parallel
work added the oom-kill back to charging path i.e. commit 29ef680ae7c2
("memcg, oom: move out_of_memory back to the charge path"). So to not
trigger oom-killer in the remote memcg, explicitly add
__GFP_RETRY_MAYFAIL to the fanotigy and inotify event allocations.

Link: http://lkml.kernel.org/r/20190514212259.156585-2-shakeelb@google.com
Signed-off-by: Shakeel Butt
Reviewed-by: Roman Gushchin
Acked-by: Jan Kara
Cc: Johannes Weiner
Cc: Vladimir Davydov
Cc: Michal Hocko
Cc: Amir Goldstein
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Shakeel Butt
2019-07-13 02:05:43 +0800

24 May, 2019

1 commit

3e0a4e858 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 118 ... Browse Code »

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 or at your option any
later version this program is distributed in the hope that it will
be useful but without any warranty without even the implied warranty
of merchantability or fitness for a particular purpose see the gnu
general public license for more details

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 44 file(s).

Signed-off-by: Thomas Gleixner
Reviewed-by: Richard Fontana
Reviewed-by: Allison Randal
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190523091651.032047323@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-05-24 23:39:02 +0800

21 May, 2019

1 commit

ec8f24b7f treewide: Add SPDX license identifier - Makefile/Kconfig ... Browse Code »

Add SPDX license identifiers to all Make/Kconfig files which:

- Have no license information of any form

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

GPL-2.0-only

Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-05-21 16:50:46 +0800

08 May, 2019

1 commit

d27fb65bc Merge branch 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull misc dcache updates from Al Viro:
"Most of this pile is putting name length into struct name_snapshot and
making use of it.

The beginning of this series ("ovl_lookup_real_one(): don't bother
with strlen()") ought to have been split in two (separate switch of
name_snapshot to struct qstr from overlayfs reaping the trivial
benefits of that), but I wanted to avoid a rebase - by the time I'd
spotted that it was (a) in -next and (b) close to 5.1-final ;-/"

* 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
audit_compare_dname_path(): switch to const struct qstr *
audit_update_watch(): switch to const struct qstr *
inotify_handle_event(): don't bother with strlen()
fsnotify: switch send_to_group() and ->handle_event to const struct qstr *
fsnotify(): switch to passing const struct qstr * for file_name
switch fsnotify_move() to passing const struct qstr * for old_name
ovl_lookup_real_one(): don't bother with strlen()
sysv: bury the broken "quietly truncate the long filenames" logics
nsfs: unobfuscate
unexport d_alloc_pseudo()

Linus Torvalds
2019-05-08 11:03:32 +0800

27 Apr, 2019

2 commits

ce163918c inotify_handle_event(): don't bother with strlen() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2019-04-27 01:55:21 +0800
e43e9c339 fsnotify: switch send_to_group() and ->handle_event to const struct qstr * ... Browse Code »

note that conditions surrounding accesses to dname in audit_watch_handle_event()
and audit_mark_handle_event() guarantee that dname won't have been NULL.

Signed-off-by: Al Viro

Al Viro
2019-04-27 01:51:03 +0800

19 Apr, 2019

1 commit

5dd50aaeb Make anon_inodes unconditional ... Browse Code »

Make the anon_inodes facility unconditional so that it can be used by core
VFS code and pidfd code.

Signed-off-by: David Howells
Signed-off-by: Al Viro
[christian@brauner.io: adapt commit message to mention pidfds]
Signed-off-by: Christian Brauner

David Howells
2019-04-19 20:03:11 +0800

11 Mar, 2019

1 commit

62c9d2674 inotify: Fix fsnotify_mark refcount leak in inotify_update_existing_watch() ... Browse Code »

Commit 4d97f7d53da7dc83 ("inotify: Add flag IN_MASK_CREATE for
inotify_add_watch()") forgot to call fsnotify_put_mark() with
IN_MASK_CREATE after fsnotify_find_mark()

Fixes: 4d97f7d53da7dc83 ("inotify: Add flag IN_MASK_CREATE for inotify_add_watch()")
Signed-off-by: ZhangXiaoxu
Signed-off-by: Jan Kara

ZhangXiaoxu
2019-03-11 17:13:17 +0800

07 Feb, 2019

1 commit

0a20df7ed fsnotify: report FS_ISDIR flag with MOVE_SELF and DELETE_SELF events ... Browse Code »

We need to report FS_ISDIR flag with MOVE_SELF and DELETE_SELF events
for fanotify, because fanotify API requires the user to explicitly
request events on directories by FAN_ONDIR flag.

inotify never reported IN_ISDIR with those events. It looks like an
oversight, but to avoid the risk of breaking existing inotify programs,
mask the FS_ISDIR flag out when reprting those events to inotify backend.

We also add the FS_ISDIR flag with FS_ATTRIB event in the case of rename
over an empty target directory. inotify did not report IN_ISDIR in this
case, but it normally does report IN_ISDIR along with IN_ATTRIB event,
so in this case, we do not mask out the FS_ISDIR flag.

[JK: Simplify the checks in fsnotify_move()]

Signed-off-by: Amir Goldstein
Signed-off-by: Jan Kara

Amir Goldstein
2019-02-07 23:38:35 +0800

06 Feb, 2019

1 commit

a0a92d261 fsnotify: move mask out of struct fsnotify_event ... Browse Code »

Common fsnotify_event helpers have no need for the mask field.
It is only used by backend code, so move the field out of the
abstract fsnotify_event struct and into the concrete backend
event structs.

This change packs struct inotify_event_info better on 64bit
machine and will allow us to cram some more fields into
struct fanotify_event_info.

Signed-off-by: Amir Goldstein
Signed-off-by: Jan Kara

Amir Goldstein
2019-02-06 22:25:11 +0800

03 Jan, 2019

1 commit

125892edf inotify: Fix fd refcount leak in inotify_add_watch(). ... Browse Code »

Commit 4d97f7d53da7dc83 ("inotify: Add flag IN_MASK_CREATE for
inotify_add_watch()") forgot to call fdput() before bailing out.

Fixes: 4d97f7d53da7dc83 ("inotify: Add flag IN_MASK_CREATE for inotify_add_watch()")
CC: stable@vger.kernel.org
Signed-off-by: Tetsuo Handa
Reviewed-by: Amir Goldstein
Signed-off-by: Jan Kara

Tetsuo Handa
2019-01-03 01:28:37 +0800

04 Oct, 2018

1 commit

a39f7ec41 fsnotify: convert runtime BUG_ON() to BUILD_BUG_ON() ... Browse Code »

The BUG_ON() statements to verify number of bits in ALL_FSNOTIFY_BITS
and ALL_INOTIFY_BITS are converted to build time check of the constant.

Signed-off-by: Amir Goldstein
Signed-off-by: Jan Kara

Amir Goldstein
2018-10-04 19:28:50 +0800

18 Aug, 2018

2 commits

6ada4e282 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge updates from Andrew Morton:

- a few misc things

- a few Y2038 fixes

- ntfs fixes

- arch/sh tweaks

- ocfs2 updates

- most of MM

* emailed patches from Andrew Morton : (111 commits)
mm/hmm.c: remove unused variables align_start and align_end
fs/userfaultfd.c: remove redundant pointer uwq
mm, vmacache: hash addresses based on pmd
mm/list_lru: introduce list_lru_shrink_walk_irq()
mm/list_lru.c: pass struct list_lru_node* as an argument to __list_lru_walk_one()
mm/list_lru.c: move locking from __list_lru_walk_one() to its caller
mm/list_lru.c: use list_lru_walk_one() in list_lru_walk_node()
mm, swap: make CONFIG_THP_SWAP depend on CONFIG_SWAP
mm/sparse: delete old sparse_init and enable new one
mm/sparse: add new sparse_init_nid() and sparse_init()
mm/sparse: move buffer init/fini to the common place
mm/sparse: use the new sparse buffer functions in non-vmemmap
mm/sparse: abstract sparse buffer allocations
mm/hugetlb.c: don't zero 1GiB bootmem pages
mm, page_alloc: double zone's batchsize
mm/oom_kill.c: document oom_lock
mm/hugetlb: remove gigantic page support for HIGHMEM
mm, oom: remove sleep from under oom_lock
kernel/dma: remove unsupported gfp_mask parameter from dma_alloc_from_contiguous()
mm/cma: remove unsupported gfp_mask parameter from cma_alloc()
...

Linus Torvalds
2018-08-18 07:49:31 +0800
d46eb14b7 fs: fsnotify: account fsnotify metadata to kmemcg ... Browse Code »

Patch series "Directed kmem charging", v8.

The Linux kernel's memory cgroup allows limiting the memory usage of the
jobs running on the system to provide isolation between the jobs. All
the kernel memory allocated in the context of the job and marked with
__GFP_ACCOUNT will also be included in the memory usage and be limited
by the job's limit.

The kernel memory can only be charged to the memcg of the process in
whose context kernel memory was allocated. However there are cases
where the allocated kernel memory should be charged to the memcg
different from the current processes's memcg. This patch series
contains two such concrete use-cases i.e. fsnotify and buffer_head.

The fsnotify event objects can consume a lot of system memory for large
or unlimited queues if there is either no or slow listener. The events
are allocated in the context of the event producer. However they should
be charged to the event consumer. Similarly the buffer_head objects can
be allocated in a memcg different from the memcg of the page for which
buffer_head objects are being allocated.

To solve this issue, this patch series introduces mechanism to charge
kernel memory to a given memcg. In case of fsnotify events, the memcg
of the consumer can be used for charging and for buffer_head, the memcg
of the page can be charged. For directed charging, the caller can use
the scope API memalloc_[un]use_memcg() to specify the memcg to charge
for all the __GFP_ACCOUNT allocations within the scope.

This patch (of 2):

A lot of memory can be consumed by the events generated for the huge or
unlimited queues if there is either no or slow listener. This can cause
system level memory pressure or OOMs. So, it's better to account the
fsnotify kmem caches to the memcg of the listener.

However the listener can be in a different memcg than the memcg of the
producer and these allocations happen in the context of the event
producer. This patch introduces remote memcg charging API which the
producer can use to charge the allocations to the memcg of the listener.

There are seven fsnotify kmem caches and among them allocations from
dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and
inotify_inode_mark_cachep happens in the context of syscall from the
listener. So, SLAB_ACCOUNT is enough for these caches.

The objects from fsnotify_mark_connector_cachep are not accounted as
they are small compared to the notification mark or events and it is
unclear whom to account connector to since it is shared by all events
attached to the inode.

The allocations from the event caches happen in the context of the event
producer. For such caches we will need to remote charge the allocations
to the listener's memcg. Thus we save the memcg reference in the
fsnotify_group structure of the listener.

This patch has also moved the members of fsnotify_group to keep the size
same, at least for 64 bit build, even with additional member by filling
the holes.

[shakeelb@google.com: use GFP_KERNEL_ACCOUNT rather than open-coding it]
Link: http://lkml.kernel.org/r/20180702215439.211597-1-shakeelb@google.com
Link: http://lkml.kernel.org/r/20180627191250.209150-2-shakeelb@google.com
Signed-off-by: Shakeel Butt
Acked-by: Johannes Weiner
Cc: Michal Hocko
Cc: Jan Kara
Cc: Amir Goldstein
Cc: Greg Thelen
Cc: Vladimir Davydov
Cc: Roman Gushchin
Cc: Alexander Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Shakeel Butt
2018-08-18 07:20:30 +0800

28 Jun, 2018

1 commit

4d97f7d53 inotify: Add flag IN_MASK_CREATE for inotify_add_watch() ... Browse Code »

The flag IN_MASK_CREATE is introduced as a flag for inotiy_add_watch()
which prevents inotify from modifying any existing watches when invoked.
If the pathname specified in the call has a watched inode associated
with it and IN_MASK_CREATE is specified, fail with an errno of EEXIST.

Use of IN_MASK_CREATE with IN_MASK_ADD is reserved for future use and
will return EINVAL.

RATIONALE

In the current implementation, there is no way to prevent
inotify_add_watch() from modifying existing watch descriptors. Even if
the caller keeps a record of all watch descriptors collected, this is
only sufficient to detect that an existing watch descriptor may have
been modified.

The assumption that a particular path will map to the same inode over
multiple calls to inotify_add_watch() cannot be made as files can be
renamed or deleted. It is also not possible to assume that two distinct
paths do no map to the same inode, due to hard-links or a dereferenced
symbolic link. Further uses of inotify_add_watch() to revert the change
may cause other watch descriptors to be modified or created, merely
compunding the problem. There is currently no system call such as
inotify_modify_watch() to explicity modify a watch descriptor, which
would be able to revert unwanted changes. Thus the caller cannot
guarantee to be able to revert any changes to existing watch decriptors.

Additionally the caller cannot assume that the events that are
associated with a watch descriptor are within the set requested, as any
future calls to inotify_add_watch() may unintentionally modify a watch
descriptor's mask. Thus it cannot currently be guaranteed that a watch
descriptor will only generate events which have been requested. The
program must filter events which come through its watch descriptor to
within its expected range.

Reviewed-by: Amir Goldstein
Signed-off-by: Henry Wilson
Signed-off-by: Jan Kara

Henry Wilson
2018-06-28 01:21:25 +0800

18 May, 2018

3 commits

b249f5be6 fsnotify: add fsnotify_add_inode_mark() wrappers ... Browse Code »

Before changing the arguments of the functions fsnotify_add_mark()
and fsnotify_add_mark_locked(), convert most callers to use a wrapper.

Signed-off-by: Amir Goldstein
Signed-off-by: Jan Kara

Amir Goldstein
2018-05-18 20:58:22 +0800
47d9c7cc4 fsnotify: generalize iteration of marks by object type ... Browse Code »

Make some code that handles marks of object types inode and vfsmount
generic, so it can handle other object types.

Introduce fsnotify_foreach_obj_type macro to iterate marks by object type
and fsnotify_iter_{should|set}_report_type macros to set/test report_mask.

This is going to be used for adding mark of another object type
(super block mark).

Signed-off-by: Amir Goldstein
Signed-off-by: Jan Kara

Amir Goldstein
2018-05-18 20:58:22 +0800
5b0457ad0 fsnotify: remove redundant arguments to handle_event() ... Browse Code »

inode_mark and vfsmount_mark arguments are passed to handle_event()
operation as function arguments as well as on iter_info struct.
The difference is that iter_info struct may contain marks that should
not be handled and are represented as NULL arguments to inode_mark or
vfsmount_mark.

Instead of passing the inode_mark and vfsmount_mark arguments, add
a report_mask member to iter_info struct to indicate which marks should
be handled, versus marks that should only be kept alive during user
wait.

This change is going to be used for passing more mark types
with handle_event() (i.e. super block marks).

Signed-off-by: Amir Goldstein
Signed-off-by: Jan Kara

Amir Goldstein
2018-05-18 20:58:22 +0800

06 Apr, 2018

1 commit

be88751f3 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs ... Browse Code »

Pull misc filesystem updates from Jan Kara:
"udf, ext2, quota, fsnotify fixes & cleanups:

- udf fixes for handling of media without uid/gid

- udf fixes for some corner cases in parsing of volume recognition
sequence

- improvements of fsnotify handling of ENOMEM

- new ioctl to allow setting of watch descriptor id for inotify (for
checkpoint - restart)

- small ext2, reiserfs, quota cleanups"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: Kill an unused extern entry form quota.h
reiserfs: Remove VLA from fs/reiserfs/reiserfs.h
udf: fix potential refcnt problem of nls module
ext2: change return code to -ENOMEM when failing memory allocation
udf: Do not mark possibly inconsistent filesystems as closed
fsnotify: Let userspace know about lost events due to ENOMEM
fanotify: Avoid lost events due to ENOMEM for unlimited queues
udf: Remove never implemented mount options
udf: Update mount option documentation
udf: Provide saner default for invalid uid / gid
udf: Clean up handling of invalid uid/gid
udf: Apply uid/gid mount options also to new inodes & chown
udf: Ignore [ug]id=ignore mount options
udf: Fix handling of Partition Descriptors
udf: Unify common handling of descriptors
udf: Convert descriptor index definitions to enum
udf: Allow volume descriptor sequence to be terminated by unrecorded block
udf: Simplify handling of Volume Descriptor Pointers
udf: Fix off-by-one in volume descriptor sequence length
inotify: Extend ioctl to allow to request id of new watch descriptor

Linus Torvalds
2018-04-06 10:17:50 +0800

03 Apr, 2018

1 commit

d0d89d1ed inotify: add do_inotify_init() helper; remove in-kernel call to syscall ... Browse Code »

Using the inotify-internal do_inotify_init() helper allows us to get rid
of the in-kernel call to sys_inotify_init1() syscall.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Acked-by: Jan Kara
Cc: Amir Goldstein
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Dominik Brodowski

Dominik Brodowski
2018-04-03 02:15:45 +0800

27 Feb, 2018

1 commit

7b1f64177 fsnotify: Let userspace know about lost events due to ENOMEM ... Browse Code »

Currently if notification event is lost due to event allocation failing
we ENOMEM, we just silently continue (except for fanotify permission
events where we deny the access). This is undesirable as userspace has
no way of knowing whether the notifications it got are complete or not.
Treat lost events due to ENOMEM the same way as lost events due to queue
overflow so that userspace knows something bad happened and it likely
needs to rescan the filesystem.

Reviewed-by: Amir Goldstein
Signed-off-by: Jan Kara

Jan Kara
2018-02-27 17:25:33 +0800

14 Feb, 2018

1 commit

e1603b6ef inotify: Extend ioctl to allow to request id of new watch descriptor ... Browse Code »

Watch descriptor is id of the watch created by inotify_add_watch().
It is allocated in inotify_add_to_idr(), and takes the numbers
starting from 1. Every new inotify watch obtains next available
number (usually, old + 1), as served by idr_alloc_cyclic().

CRIU (Checkpoint/Restore In Userspace) project supports inotify
files, and restores watched descriptors with the same numbers,
they had before dump. Since there was no kernel support, we
had to use cycle to add a watch with specific descriptor id:

while (1) {
int wd;

wd = inotify_add_watch(inotify_fd, path, mask);
if (wd < 0) {
break;
} else if (wd == desired_wd_id) {
ret = 0;
break;
}

inotify_rm_watch(inotify_fd, wd);
}

(You may find the actual code at the below link:
https://github.com/checkpoint-restore/criu/blob/v3.7/criu/fsnotify.c#L577)

The cycle is suboptiomal and very expensive, but since there is no better
kernel support, it was the only way to restore that. Happily, we had met
mostly descriptors with small id, and this approach had worked somehow.

But recent time containers with inotify with big watch descriptors
begun to come, and this way stopped to work at all. When descriptor id
is something about 0x34d71d6, the restoring process spins in busy loop
for a long time, and the restore hungs and delay of migration from node
to node could easily be watched.

This patch aims to solve this problem. It introduces new ioctl
INOTIFY_IOC_SETNEXTWD, which allows to request the number of next created
watch descriptor from userspace. It simply calls idr_set_cursor() primitive
to populate idr::idr_next, so that next idr_alloc_cyclic() allocation
will return this id, if it is not occupied. This is the way which is
used to restore some other resources from userspace. For example,
/proc/sys/kernel/ns_last_pid works the same for task pids.

The new code is under CONFIG_CHECKPOINT_RESTORE #define, so small system
may exclude it.

v2: Use INT_MAX instead of custom definition of max id,
as IDR subsystem guarantees id is between 0 and INT_MAX.

CC: Jan Kara
CC: Matthew Wilcox
CC: Andrew Morton
CC: Amir Goldstein
Signed-off-by: Kirill Tkhai
Reviewed-by: Cyrill Gorcunov
Reviewed-by: Matthew Wilcox
Reviewed-by: Andrew Morton
Signed-off-by: Jan Kara

Kirill Tkhai
2018-02-14 18:16:28 +0800

12 Feb, 2018

1 commit

a9a08845e vfs: do bulk POLL* -> EPOLL* replacement ... Browse Code »

This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:

for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
for f in $L; do sed -i "-es/^$[^\"]*$$\$/\\1E\\2/" $f; done
done

with de-mangling cleanups yet to come.

NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do. But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.

The next patch from Al will sort out the final differences, and we
should be all done.

Scripted-by: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2018-02-12 06:34:03 +0800

28 Nov, 2017

1 commit

076ccb76e fs: annotate ->poll() instances ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2017-11-28 05:20:05 +0800

15 Nov, 2017

1 commit

23281c803 Merge branch 'fsnotify' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs ... Browse Code »

Pull fsnotify updates from Jan Kara:

- fixes of use-after-tree issues when handling fanotify permission
events from Miklos

- refcount_t conversions from Elena

- fixes of ENOMEM handling in dnotify and fsnotify from me

* 'fsnotify' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fsnotify: convert fsnotify_mark.refcnt from atomic_t to refcount_t
fanotify: clean up CONFIG_FANOTIFY_ACCESS_PERMISSIONS ifdefs
fsnotify: clean up fsnotify()
fanotify: fix fsnotify_prepare_user_wait() failure
fsnotify: fix pinning group in fsnotify_prepare_user_wait()
fsnotify: pin both inode and vfsmount mark
fsnotify: clean up fsnotify_prepare/finish_user_wait()
fsnotify: convert fsnotify_group.refcnt from atomic_t to refcount_t
fsnotify: Protect bail out path of fsnotify_add_mark_locked() properly
dnotify: Handle errors from fsnotify_add_mark_locked() in fcntl_dirnotify()

Linus Torvalds
2017-11-15 06:08:20 +0800