12 Jan, 2020

2 commits

  • [ Upstream commit 1edc8eb2e93130e36ac74ac9c80913815a57d413 ]

    When a filesystem is unmounted, we currently call fsnotify_sb_delete()
    before evict_inodes(), which means that fsnotify_unmount_inodes()
    must iterate over all inodes on the superblock looking for any inodes
    with watches. This is inefficient and can lead to livelocks as it
    iterates over many unwatched inodes.

    At this point, SB_ACTIVE is gone and dropping refcount to zero kicks
    the inode out out immediately, so anything processed by
    fsnotify_sb_delete / fsnotify_unmount_inodes gets evicted in that loop.

    After that, the call to evict_inodes will evict everything else with a
    zero refcount.

    This should speed things up overall, and avoid livelocks in
    fsnotify_unmount_inodes().

    Signed-off-by: Eric Sandeen
    Reviewed-by: Jan Kara
    Signed-off-by: Al Viro
    Signed-off-by: Sasha Levin

    Eric Sandeen
     
  • [ Upstream commit 04646aebd30b99f2cfa0182435a2ec252fcb16d0 ]

    Anything that walks all inodes on sb->s_inodes list without rescheduling
    risks softlockups.

    Previous efforts were made in 2 functions, see:

    c27d82f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
    ac05fbb inode: don't softlockup when evicting inodes

    but there hasn't been an audit of all walkers, so do that now. This
    also consistently moves the cond_resched() calls to the bottom of each
    loop in cases where it already exists.

    One loop remains: remove_dquot_ref(), because I'm not quite sure how
    to deal with that one w/o taking the i_lock.

    Signed-off-by: Eric Sandeen
    Reviewed-by: Jan Kara
    Signed-off-by: Al Viro
    Signed-off-by: Sasha Levin

    Eric Sandeen
     

28 Sep, 2019

1 commit

  • Pull nfsd updates from Bruce Fields:
    "Highlights:

    - Add a new knfsd file cache, so that we don't have to open and close
    on each (NFSv2/v3) READ or WRITE. This can speed up read and write
    in some cases. It also replaces our readahead cache.

    - Prevent silent data loss on write errors, by treating write errors
    like server reboots for the purposes of write caching, thus forcing
    clients to resend their writes.

    - Tweak the code that allocates sessions to be more forgiving, so
    that NFSv4.1 mounts are less likely to hang when a server already
    has a lot of clients.

    - Eliminate an arbitrary limit on NFSv4 ACL sizes; they should now be
    limited only by the backend filesystem and the maximum RPC size.

    - Allow the server to enforce use of the correct kerberos credentials
    when a client reclaims state after a reboot.

    And some miscellaneous smaller bugfixes and cleanup"

    * tag 'nfsd-5.4' of git://linux-nfs.org/~bfields/linux: (34 commits)
    sunrpc: clean up indentation issue
    nfsd: fix nfs read eof detection
    nfsd: Make nfsd_reset_boot_verifier_locked static
    nfsd: degraded slot-count more gracefully as allocation nears exhaustion.
    nfsd: handle drc over-allocation gracefully.
    nfsd: add support for upcall version 2
    nfsd: add a "GetVersion" upcall for nfsdcld
    nfsd: Reset the boot verifier on all write I/O errors
    nfsd: Don't garbage collect files that might contain write errors
    nfsd: Support the server resetting the boot verifier
    nfsd: nfsd_file cache entries should be per net namespace
    nfsd: eliminate an unnecessary acl size limit
    Deprecate nfsd fault injection
    nfsd: remove duplicated include from filecache.c
    nfsd: Fix the documentation for svcxdr_tmpalloc()
    nfsd: Fix up some unused variable warnings
    nfsd: close cached files prior to a REMOVE or RENAME that would replace target
    nfsd: rip out the raparms cache
    nfsd: have nfsd_test_lock use the nfsd_file cache
    nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache
    ...

    Linus Torvalds
     

24 Sep, 2019

1 commit

  • Pull selinux updates from Paul Moore:

    - Add LSM hooks, and SELinux access control hooks, for dnotify,
    fanotify, and inotify watches. This has been discussed with both the
    LSM and fs/notify folks and everybody is good with these new hooks.

    - The LSM stacking changes missed a few calls to current_security() in
    the SELinux code; we fix those and remove current_security() for
    good.

    - Improve our network object labeling cache so that we always return
    the object's label, even when under memory pressure. Previously we
    would return an error if we couldn't allocate a new cache entry, now
    we always return the label even if we can't create a new cache entry
    for it.

    - Convert the sidtab atomic_t counter to a normal u32 with
    READ/WRITE_ONCE() and memory barrier protection.

    - A few patches to policydb.c to clean things up (remove forward
    declarations, long lines, bad variable names, etc)

    * tag 'selinux-pr-20190917' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
    lsm: remove current_security()
    selinux: fix residual uses of current_security() for the SELinux blob
    selinux: avoid atomic_t usage in sidtab
    fanotify, inotify, dnotify, security: add security hook for fs notifications
    selinux: always return a secid from the network caches if we find one
    selinux: policydb - rename type_val_to_struct_array
    selinux: policydb - fix some checkpatch.pl warnings
    selinux: shuffle around policydb.c to get rid of forward declarations

    Linus Torvalds
     

19 Aug, 2019

1 commit


13 Aug, 2019

1 commit

  • As of now, setting watches on filesystem objects has, at most, applied a
    check for read access to the inode, and in the case of fanotify, requires
    CAP_SYS_ADMIN. No specific security hook or permission check has been
    provided to control the setting of watches. Using any of inotify, dnotify,
    or fanotify, it is possible to observe, not only write-like operations, but
    even read access to a file. Modeling the watch as being merely a read from
    the file is insufficient for the needs of SELinux. This is due to the fact
    that read access should not necessarily imply access to information about
    when another process reads from a file. Furthermore, fanotify watches grant
    more power to an application in the form of permission events. While
    notification events are solely, unidirectional (i.e. they only pass
    information to the receiving application), permission events are blocking.
    Permission events make a request to the receiving application which will
    then reply with a decision as to whether or not that action may be
    completed. This causes the issue of the watching application having the
    ability to exercise control over the triggering process. Without drawing a
    distinction within the permission check, the ability to read would imply
    the greater ability to control an application. Additionally, mount and
    superblock watches apply to all files within the same mount or superblock.
    Read access to one file should not necessarily imply the ability to watch
    all files accessed within a given mount or superblock.

    In order to solve these issues, a new LSM hook is implemented and has been
    placed within the system calls for marking filesystem objects with inotify,
    fanotify, and dnotify watches. These calls to the hook are placed at the
    point at which the target path has been resolved and are provided with the
    path struct, the mask of requested notification events, and the type of
    object on which the mark is being set (inode, superblock, or mount). The
    mask and obj_type have already been translated into common FS_* values
    shared by the entirety of the fs notification infrastructure. The path
    struct is passed rather than just the inode so that the mount is available,
    particularly for mount watches. This also allows for use of the hook by
    pathname-based security modules. However, since the hook is intended for
    use even by inode based security modules, it is not placed under the
    CONFIG_SECURITY_PATH conditional. Otherwise, the inode-based security
    modules would need to enable all of the path hooks, even though they do not
    use any of them.

    This only provides a hook at the point of setting a watch, and presumes
    that permission to set a particular watch implies the ability to receive
    all notification about that object which match the mask. This is all that
    is required for SELinux. If other security modules require additional hooks
    or infrastructure to control delivery of notification, these can be added
    by them. It does not make sense for us to propose hooks for which we have
    no implementation. The understanding that all notifications received by the
    requesting application are all strictly of a type for which the application
    has been granted permission shows that this implementation is sufficient in
    its coverage.

    Security modules wishing to provide complete control over fanotify must
    also implement a security_file_open hook that validates that the access
    requested by the watching application is authorized. Fanotify has the issue
    that it returns a file descriptor with the file mode specified during
    fanotify_init() to the watching process on event. This is already covered
    by the LSM security_file_open hook if the security module implements
    checking of the requested file mode there. Otherwise, a watching process
    can obtain escalated access to a file for which it has not been authorized.

    The selinux_path_notify hook implementation works by adding five new file
    permissions: watch, watch_mount, watch_sb, watch_reads, and watch_with_perm
    (descriptions about which will follow), and one new filesystem permission:
    watch (which is applied to superblock checks). The hook then decides which
    subset of these permissions must be held by the requesting application
    based on the contents of the provided mask and the obj_type. The
    selinux_file_open hook already checks the requested file mode and therefore
    ensures that a watching process cannot escalate its access through
    fanotify.

    The watch, watch_mount, and watch_sb permissions are the baseline
    permissions for setting a watch on an object and each are a requirement for
    any watch to be set on a file, mount, or superblock respectively. It should
    be noted that having either of the other two permissions (watch_reads and
    watch_with_perm) does not imply the watch, watch_mount, or watch_sb
    permission. Superblock watches further require the filesystem watch
    permission to the superblock. As there is no labeled object in view for
    mounts, there is no specific check for mount watches beyond watch_mount to
    the inode. Such a check could be added in the future, if a suitable labeled
    object existed representing the mount.

    The watch_reads permission is required to receive notifications from
    read-exclusive events on filesystem objects. These events include accessing
    a file for the purpose of reading and closing a file which has been opened
    read-only. This distinction has been drawn in order to provide a direct
    indication in the policy for this otherwise not obvious capability. Read
    access to a file should not necessarily imply the ability to observe read
    events on a file.

    Finally, watch_with_perm only applies to fanotify masks since it is the
    only way to set a mask which allows for the blocking, permission event.
    This permission is needed for any watch which is of this type. Though
    fanotify requires CAP_SYS_ADMIN, this is insufficient as it gives implicit
    trust to root, which we do not do, and does not support least privilege.

    Signed-off-by: Aaron Goidel
    Acked-by: Casey Schaufler
    Acked-by: Jan Kara
    Signed-off-by: Paul Moore

    Aaron Goidel
     

19 Jul, 2019

1 commit

  • In the sysctl code the proc_dointvec_minmax() function is often used to
    validate the user supplied value between an allowed range. This
    function uses the extra1 and extra2 members from struct ctl_table as
    minimum and maximum allowed value.

    On sysctl handler declaration, in every source file there are some
    readonly variables containing just an integer which address is assigned
    to the extra1 and extra2 members, so the sysctl range is enforced.

    The special values 0, 1 and INT_MAX are very often used as range
    boundary, leading duplication of variables like zero=0, one=1,
    int_max=INT_MAX in different source files:

    $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l
    248

    Add a const int array containing the most commonly used values, some
    macros to refer more easily to the correct array member, and use them
    instead of creating a local one for every object file.

    This is the bloat-o-meter output comparing the old and new binary
    compiled with the default Fedora config:

    # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o
    add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164)
    Data old new delta
    sysctl_vals - 12 +12
    __kstrtab_sysctl_vals - 12 +12
    max 14 10 -4
    int_max 16 - -16
    one 68 - -68
    zero 128 28 -100
    Total: Before=20583249, After=20583085, chg -0.00%

    [mcroce@redhat.com: tipc: remove two unused variables]
    Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com
    [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c]
    [arnd@arndb.de: proc/sysctl: make firmware loader table conditional]
    Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de
    [akpm@linux-foundation.org: fix fs/eventpoll.c]
    Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com
    Signed-off-by: Matteo Croce
    Signed-off-by: Arnd Bergmann
    Acked-by: Kees Cook
    Reviewed-by: Aaron Tomlin
    Cc: Matthew Wilcox
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matteo Croce
     

13 Jul, 2019

1 commit

  • Commit d46eb14b735b ("fs: fsnotify: account fsnotify metadata to
    kmemcg") added remote memcg charging for fanotify and inotify event
    objects. The aim was to charge the memory to the listener who is
    interested in the events but without triggering the OOM killer.
    Otherwise there would be security concerns for the listener.

    At the time, oom-kill trigger was not in the charging path. A parallel
    work added the oom-kill back to charging path i.e. commit 29ef680ae7c2
    ("memcg, oom: move out_of_memory back to the charge path"). So to not
    trigger oom-killer in the remote memcg, explicitly add
    __GFP_RETRY_MAYFAIL to the fanotigy and inotify event allocations.

    Link: http://lkml.kernel.org/r/20190514212259.156585-2-shakeelb@google.com
    Signed-off-by: Shakeel Butt
    Reviewed-by: Roman Gushchin
    Acked-by: Jan Kara
    Cc: Johannes Weiner
    Cc: Vladimir Davydov
    Cc: Michal Hocko
    Cc: Amir Goldstein
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shakeel Butt
     

11 Jul, 2019

1 commit

  • Pull fsnotify updates from Jan Kara:
    "This contains cleanups of the fsnotify name removal hook and also a
    patch to disable fanotify permission events for 'proc' filesystem"

    * tag 'fsnotify_for_v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    fsnotify: get rid of fsnotify_nameremove()
    fsnotify: move fsnotify_nameremove() hook out of d_delete()
    configfs: call fsnotify_rmdir() hook
    debugfs: call fsnotify_{unlink,rmdir}() hooks
    debugfs: simplify __debugfs_remove_file()
    devpts: call fsnotify_unlink() hook
    tracefs: call fsnotify_{unlink,rmdir}() hooks
    rpc_pipefs: call fsnotify_{unlink,rmdir}() hooks
    btrfs: call fsnotify_rmdir() hook
    fsnotify: add empty fsnotify_{unlink,rmdir}() hooks
    fanotify: Disallow permission events for proc filesystem

    Linus Torvalds
     

20 Jun, 2019

1 commit

  • For all callers of fsnotify_{unlink,rmdir}(), we made sure that d_parent
    and d_name are stable. Therefore, fsnotify_{unlink,rmdir}() do not need
    the safety measures in fsnotify_nameremove() to stabilize parent and name.
    We can now simplify those hooks and get rid of fsnotify_nameremove().

    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     

19 Jun, 2019

1 commit

  • When implementing connector fsid cache, we only initialized the cache
    when the first mark added to object was added by FAN_REPORT_FID group.
    We forgot to update conn->fsid when the second mark is added by
    FAN_REPORT_FID group to an already attached connector without fsid
    cache.

    Reported-and-tested-by: syzbot+c277e8e2f46414645508@syzkaller.appspotmail.com
    Fixes: 77115225acc6 ("fanotify: cache fsid in fsnotify_mark_connector")
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     

29 May, 2019

1 commit

  • Proc filesystem has special locking rules for various files. Thus
    fanotify which opens files on event delivery can easily deadlock
    against another process that waits for fanotify permission event to be
    handled. Since permission events on /proc have doubtful value anyway,
    just disallow them.

    Link: https://lore.kernel.org/linux-fsdevel/20190320131642.GE9485@quack2.suse.cz/
    Reviewed-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Jan Kara
     

24 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 or at your option any
    later version this program is distributed in the hope that it will
    be useful but without any warranty without even the implied warranty
    of merchantability or fitness for a particular purpose see the gnu
    general public license for more details

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 44 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Richard Fontana
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190523091651.032047323@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

21 May, 2019

2 commits

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 or at your option any
    later version this program is distributed in the hope that it will
    be useful but without any warranty without even the implied warranty
    of merchantability or fitness for a particular purpose see the gnu
    general public license for more details you should have received a
    copy of the gnu general public license along with this program see
    the file copying if not write to the free software foundation 675
    mass ave cambridge ma 02139 usa

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 52 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Jilayne Lovejoy
    Reviewed-by: Steve Winslow
    Reviewed-by: Kate Stewart
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190519154042.342335923@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • Add SPDX license identifiers to all Make/Kconfig files which:

    - Have no license information of any form

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

14 May, 2019

1 commit


09 May, 2019

1 commit

  • __fsnotify_parent() has an optimization in place to avoid unneeded
    take_dentry_name_snapshot(). When fsnotify_nameremove() was changed
    not to call __fsnotify_parent(), we left out the optimization.
    Kernel test robot reported a 5% performance regression in concurrent
    unlink() workload.

    Reported-by: kernel test robot
    Link: https://lore.kernel.org/lkml/20190505062153.GG29809@shao2-debian/
    Link: https://lore.kernel.org/linux-fsdevel/20190104090357.GD22409@quack2.suse.cz/
    Fixes: 5f02a8776384 ("fsnotify: annotate directory entry modification events")
    CC: stable@vger.kernel.org
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     

08 May, 2019

2 commits

  • Pull misc dcache updates from Al Viro:
    "Most of this pile is putting name length into struct name_snapshot and
    making use of it.

    The beginning of this series ("ovl_lookup_real_one(): don't bother
    with strlen()") ought to have been split in two (separate switch of
    name_snapshot to struct qstr from overlayfs reaping the trivial
    benefits of that), but I wanted to avoid a rebase - by the time I'd
    spotted that it was (a) in -next and (b) close to 5.1-final ;-/"

    * 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    audit_compare_dname_path(): switch to const struct qstr *
    audit_update_watch(): switch to const struct qstr *
    inotify_handle_event(): don't bother with strlen()
    fsnotify: switch send_to_group() and ->handle_event to const struct qstr *
    fsnotify(): switch to passing const struct qstr * for file_name
    switch fsnotify_move() to passing const struct qstr * for old_name
    ovl_lookup_real_one(): don't bother with strlen()
    sysv: bury the broken "quietly truncate the long filenames" logics
    nsfs: unobfuscate
    unexport d_alloc_pseudo()

    Linus Torvalds
     
  • Pull pidfd updates from Christian Brauner:
    "This patchset makes it possible to retrieve pidfds at process creation
    time by introducing the new flag CLONE_PIDFD to the clone() system
    call. Linus originally suggested to implement this as a new flag to
    clone() instead of making it a separate system call.

    After a thorough review from Oleg CLONE_PIDFD returns pidfds in the
    parent_tidptr argument. This means we can give back the associated pid
    and the pidfd at the same time. Access to process metadata information
    thus becomes rather trivial.

    As has been agreed, CLONE_PIDFD creates file descriptors based on
    anonymous inodes similar to the new mount api. They are made
    unconditional by this patchset as they are now needed by core kernel
    code (vfs, pidfd) even more than they already were before (timerfd,
    signalfd, io_uring, epoll etc.). The core patchset is rather small.
    The bulky looking changelist is caused by David's very simple changes
    to Kconfig to make anon inodes unconditional.

    A pidfd comes with additional information in fdinfo if the kernel
    supports procfs. The fdinfo file contains the pid of the process in
    the callers pid namespace in the same format as the procfs status
    file, i.e. "Pid:\t%d".

    To remove worries about missing metadata access this patchset comes
    with a sample/test program that illustrates how a combination of
    CLONE_PIDFD and pidfd_send_signal() can be used to gain race-free
    access to process metadata through /proc/.

    Further work based on this patchset has been done by Joel. His work
    makes pidfds pollable. It finished too late for this merge window. I
    would prefer to have it sitting in linux-next for a while and send it
    for inclusion during the 5.3 merge window"

    * tag 'pidfd-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
    samples: show race-free pidfd metadata access
    signal: support CLONE_PIDFD with pidfd_send_signal
    clone: add CLONE_PIDFD
    Make anon_inodes unconditional

    Linus Torvalds
     

02 May, 2019

1 commit


29 Apr, 2019

1 commit

  • fanotify_get_fsid() is reading mark->connector->fsid under srcu. It can
    happen that it sees mark not fully initialized or mark that is already
    detached from the object list. In these cases mark->connector
    can be NULL leading to NULL ptr dereference. Fix the problem by
    being careful when reading mark->connector and check it for being NULL.
    Also use WRITE_ONCE when writing the mark just to prevent compiler from
    doing something stupid.

    Reported-by: syzbot+15927486a4f1bfcbaf91@syzkaller.appspotmail.com
    Fixes: 77115225acc6 ("fanotify: cache fsid in fsnotify_mark_connector")
    Signed-off-by: Jan Kara

    Jan Kara
     

27 Apr, 2019

4 commits


19 Apr, 2019

1 commit

  • Make the anon_inodes facility unconditional so that it can be used by core
    VFS code and pidfd code.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro
    [christian@brauner.io: adapt commit message to mention pidfds]
    Signed-off-by: Christian Brauner

    David Howells
     

19 Mar, 2019

1 commit

  • When file handle is embedded inside fanotify_event and usercopy checks
    are enabled, we get a warning like:

    Bad or missing usercopy whitelist? Kernel memory exposure attempt detected
    from SLAB object 'fanotify_event' (offset 40, size 8)!
    WARNING: CPU: 1 PID: 7649 at mm/usercopy.c:78 usercopy_warn+0xeb/0x110
    mm/usercopy.c:78

    Annotate handling in fanotify_event properly to mark copying it to
    userspace is fine.

    Reported-by: syzbot+2c49971e251e36216d1f@syzkaller.appspotmail.com
    Fixes: a8b13aa20afb ("fanotify: enable FAN_REPORT_FID init flag")
    Signed-off-by: Kees Cook
    Reviewed-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Jan Kara
     

11 Mar, 2019

1 commit


21 Feb, 2019

1 commit

  • Making waits for response to fanotify permission events interruptible
    can result in EINTR returns from open(2) or other syscalls when there's
    e.g. AV software that's monitoring the file. Orion reports that e.g.
    bash is complaining like:

    bash: /etc/bash_completion.d/itweb-settings.bash: Interrupted system call

    So for now convert the wait from interruptible to only killable one.
    That is mostly invisible to userspace. Sadly this breaks hibernation
    with fanotify permission events pending again but we have to put more
    thought into how to fix this without regressing userspace visible
    behavior.

    Reported-by: Orion Poplawski
    Signed-off-by: Jan Kara

    Jan Kara
     

18 Feb, 2019

6 commits

  • When waiting for response to fanotify permission events, we currently
    use uninterruptible waits. That makes code simple however it can cause
    lots of processes to end up in uninterruptible sleep with hard reboot
    being the only alternative in case fanotify listener process stops
    responding (e.g. due to a bug in its implementation). Uninterruptible
    sleep also makes system hibernation fail if the listener gets frozen
    before the process generating fanotify permission event.

    Fix these problems by using interruptible sleep for waiting for response
    to fanotify event. This is slightly tricky though - we have to
    detect when the event got already reported to userspace as in that
    case we must not free the event. Instead we push the responsibility for
    freeing the event to the process that will write response to the
    event.

    Reported-by: Orion Poplawski
    Reported-by: Konstantin Khlebnikov
    Reviewed-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Jan Kara
     
  • Track whether permission event got already reported to userspace and
    whether userspace already answered to the permission event. Protect
    stores to this field together with updates to ->response field by
    group->notification_lock. This will allow aborting wait for reply to
    permission event from userspace.

    Reviewed-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Jan Kara
     
  • Simplify iteration cleaning access_list in fanotify_release(). That will
    make following changes more obvious.

    Reviewed-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Jan Kara
     
  • Create function to remove event from the notification list. Later it will
    be used from more places.

    Reviewed-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Jan Kara
     
  • get_one_event() has a single caller and that just locks
    notification_lock around the call. Move locking inside get_one_event()
    as that will make using ->response field for permission event state
    easier.

    Reviewed-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Jan Kara
     
  • Fold dequeue_event() into process_access_response(). This will make
    changes to use of ->response field easier.

    Reviewed-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Jan Kara
     

15 Feb, 2019

1 commit

  • Fanotify now uses exportfs_encode_inode_fh() so it needs to select
    EXPORTFS.

    Fixes: e9e0c8903009 "fanotify: encode file identifier for FAN_REPORT_FID"
    Reported-by: Randy Dunlap
    Signed-off-by: Jan Kara

    Jan Kara
     

07 Feb, 2019

4 commits

  • dirent modification events (create/delete/move) do not carry the
    child entry name/inode information. Instead, we report FAN_ONDIR
    for mkdir/rmdir so user can differentiate them from creat/unlink.

    This is consistent with inotify reporting IN_ISDIR with dirent events
    and is useful for implementing recursive directory tree watcher.

    We avoid merging dirent events referring to subdirs with dirent events
    referring to non subdirs, otherwise, user won't be able to tell from a
    mask FAN_CREATE|FAN_DELETE|FAN_ONDIR if it describes mkdir+unlink pair
    or rmdir+create pair of events.

    For backward compatibility and consistency, do not report FAN_ONDIR
    to user in legacy fanotify mode (reporting fd) and report FAN_ONDIR
    to user in FAN_REPORT_FID mode for all event types.

    Cc:
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     
  • Add support for events with data type FSNOTIFY_EVENT_INODE
    (e.g. create/attrib/move/delete) for inode and filesystem mark types.

    The "inode" events do not carry enough information (i.e. path) to
    report event->fd, so we do not allow setting a mask for those events
    unless group supports reporting fid.

    The "inode" events are not supported on a mount mark, because they do
    not carry enough information (i.e. path) to be filtered by mount point.

    The "dirent" events (create/move/delete) report the fid of the parent
    directory where events took place without specifying the filename of the
    child. In the future, fanotify may get support for reporting filename
    information for those events.

    Cc:
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     
  • When event data type is FSNOTIFY_EVENT_INODE, we don't have a refernece
    to the mount, so we will not be able to open a file descriptor when user
    reads the event. However, if the listener has enabled reporting file
    identifier with the FAN_REPORT_FID init flag, we allow reporting those
    events and we use an identifier inode to encode fid.

    The inode to use as identifier when reporting fid depends on the event.
    For dirent modification events, we report the modified directory inode
    and we report the "victim" inode otherwise.
    For example:
    FS_ATTRIB reports the child inode even if reported on a watched parent.
    FS_CREATE reports the modified dir inode and not the created inode.

    [JK: Fixup condition in fanotify_group_event_mask()]

    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     
  • All fsnotify hooks set the FS_ISDIR flag for events that happen
    on directory victim inodes except for fsnotify_perm().

    Add the missing FS_ISDIR flag in fsnotify_perm() hook and let
    fanotify_group_event_mask() check the FS_ISDIR flag instead of
    checking if path argument is a directory.

    This is needed for fanotify support for event types that do not
    carry path information.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein