17 Jan, 2021

1 commit

  • commit 2ca408d9c749c32288bc28725f9f12ba30299e8f upstream.

    Commit

    121b32a58a3a ("x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments")

    converted native x86-32 which take 64-bit arguments to use the
    compat handlers to allow conversion to passing args via pt_regs.
    sys_fanotify_mark() was however missed, as it has a general compat
    handler. Add a config option that will use the syscall wrapper that
    takes the split args for native 32-bit.

    [ bp: Fix typo in Kconfig help text. ]

    Fixes: 121b32a58a3a ("x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments")
    Reported-by: Paweł Jasiak
    Signed-off-by: Brian Gerst
    Signed-off-by: Borislav Petkov
    Acked-by: Jan Kara
    Acked-by: Andy Lutomirski
    Link: https://lkml.kernel.org/r/20201130223059.101286-1-brgerst@gmail.com
    Signed-off-by: Greg Kroah-Hartman

    Brian Gerst
     

30 Dec, 2020

1 commit

  • commit fecc4559780d52d174ea05e3bf543669165389c3 upstream.

    fsnotify_parent() used to send two separate events to backends when a
    parent inode is watching children and the child inode is also watching.
    In an attempt to avoid duplicate events in fanotify, we unified the two
    backend callbacks to a single callback and handled the reporting of the
    two separate events for the relevant backends (inotify and dnotify).
    However the handling is buggy and can result in inotify and dnotify
    listeners receiving events of the type they never asked for or spurious
    events.

    The problem is the unified event callback with two inode marks (parent and
    child) is called when any of the parent and child inodes are watched and
    interested in the event, but the parent inode's mark that is interested
    in the event on the child is not necessarily the one we are currently
    reporting to (it could belong to a different group).

    So before reporting the parent or child event flavor to backend we need
    to check that the mark is really interested in that event flavor.

    The semantics of INODE and CHILD marks were hard to follow and made the
    logic more complicated than it should have been. Replace it with INODE
    and PARENT marks semantics to hopefully make the logic more clear.

    Thanks to Hugh Dickins for spotting a bug in the earlier version of this
    patch.

    Fixes: 497b0c5a7c06 ("fsnotify: send event to parent and child with single callback")
    CC: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20201202120713.702387-4-amir73il@gmail.com
    Reported-by: Hugh Dickins
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara
    Signed-off-by: Greg Kroah-Hartman

    Amir Goldstein
     

19 Oct, 2020

1 commit

  • Currently the remote memcg charging API consists of two functions:
    memalloc_use_memcg() and memalloc_unuse_memcg(), which set and clear the
    memcg value, which overwrites the memcg of the current task.

    memalloc_use_memcg(target_memcg);

    memalloc_unuse_memcg();

    It works perfectly for allocations performed from a normal context,
    however an attempt to call it from an interrupt context or just nest two
    remote charging blocks will lead to an incorrect accounting. On exit from
    the inner block the active memcg will be cleared instead of being
    restored.

    memalloc_use_memcg(target_memcg);

    memalloc_use_memcg(target_memcg_2);

    memalloc_unuse_memcg();

    Error: allocation here are charged to the memcg of the current
    process instead of target_memcg.

    memalloc_unuse_memcg();

    This patch extends the remote charging API by switching to a single
    function: struct mem_cgroup *set_active_memcg(struct mem_cgroup *memcg),
    which sets the new value and returns the old one. So a remote charging
    block will look like:

    old_memcg = set_active_memcg(target_memcg);

    set_active_memcg(old_memcg);

    This patch is heavily based on the patch by Johannes Weiner, which can be
    found here: https://lkml.org/lkml/2020/5/28/806 .

    Signed-off-by: Roman Gushchin
    Signed-off-by: Andrew Morton
    Reviewed-by: Shakeel Butt
    Cc: Johannes Weiner
    Cc: Dan Schatzberg
    Link: https://lkml.kernel.org/r/20200821212056.3769116-1-guro@fb.com
    Signed-off-by: Linus Torvalds

    Roman Gushchin
     

24 Aug, 2020

1 commit

  • Replace the existing /* fall through */ comments and its variants with
    the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
    fall-through markings when it is the case.

    [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

    Signed-off-by: Gustavo A. R. Silva

    Gustavo A. R. Silva
     

28 Jul, 2020

18 commits

  • When merging name events, fsids of the two involved events have to
    match. Otherwise we could merge events from two different filesystems
    and thus effectively loose the second event.

    Backporting note: Although the commit cacfb956d46e introducing this bug
    was merged for 5.7, the relevant code didn't get used in the end until
    7e8283af6ede ("fanotify: report parent fid + name + child fid") which
    will be merged with this patch. So there's no need for backporting this.

    Fixes: cacfb956d46e ("fanotify: record name info for FAN_DIR_MODIFY event")
    Reported-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Jan Kara
     
  • Add support for FAN_REPORT_FID | FAN_REPORT_DIR_FID.
    Internally, it is implemented as a private case of reporting both
    parent and child fids and name, the parent and child fids are recorded
    in a variable length fanotify_name_event, but there is no name.

    It should be noted that directory modification events are recorded
    in fixed size fanotify_fid_event when not reporting name, just like
    with group flags FAN_REPORT_FID.

    Link: https://lore.kernel.org/r/20200716084230.30611-23-amir73il@gmail.com
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     
  • For a group with fanotify_init() flag FAN_REPORT_DFID_NAME, the parent
    fid and name are reported for events on non-directory objects with an
    info record of type FAN_EVENT_INFO_TYPE_DFID_NAME.

    If the group also has the init flag FAN_REPORT_FID, the child fid
    is also reported with another info record that follows the first info
    record. The second info record is the same info record that would have
    been reported to a group with only FAN_REPORT_FID flag.

    When the child fid needs to be recorded, the variable size struct
    fanotify_name_event is preallocated with enough space to store the
    child fh between the dir fh and the name.

    Link: https://lore.kernel.org/r/20200716084230.30611-22-amir73il@gmail.com
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     
  • Introduce a new fanotify_init() flag FAN_REPORT_NAME. It requires the
    flag FAN_REPORT_DIR_FID and there is a constant for setting both flags
    named FAN_REPORT_DFID_NAME.

    For a group with flag FAN_REPORT_NAME, the parent fid and name are
    reported for directory entry modification events (create/detete/move)
    and for events on non-directory objects.

    Events on directories themselves are reported with their own fid and
    "." as the name.

    The parent fid and name are reported with an info record of type
    FAN_EVENT_INFO_TYPE_DFID_NAME, similar to the way that parent fid is
    reported with into type FAN_EVENT_INFO_TYPE_DFID, but with an appended
    null terminated name string.

    Link: https://lore.kernel.org/r/20200716084230.30611-21-amir73il@gmail.com
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     
  • In a group with flag FAN_REPORT_DIR_FID, when adding an inode mark with
    FAN_EVENT_ON_CHILD, events on non-directory children are reported with
    the fid of the parent.

    When adding a filesystem or mount mark or mark on a non-dir inode, we
    want to report events that are "possible on child" (e.g. open/close)
    also with fid of the parent, as if the victim inode's parent is
    interested in events "on child".

    Some events, currently only FAN_MOVE_SELF, should be reported to a
    sb/mount/non-dir mark with parent fid even though they are not
    reported to a watching parent.

    To get the desired behavior we set the flag FAN_EVENT_ON_CHILD on
    all the sb/mount/non-dir mark masks in a group with FAN_REPORT_DIR_FID.

    Link: https://lore.kernel.org/r/20200716084230.30611-20-amir73il@gmail.com
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     
  • For now, the flag is mutually exclusive with FAN_REPORT_FID.
    Events include a single info record of type FAN_EVENT_INFO_TYPE_DFID
    with a directory file handle.

    For now, events are only reported for:
    - Directory modification events
    - Events on children of a watching directory
    - Events on directory objects

    Soon, we will add support for reporting the parent directory fid
    for events on non-directories with filesystem/mount mark and
    support for reporting both parent directory fid and child fid.

    Link: https://lore.kernel.org/r/20200716084230.30611-19-amir73il@gmail.com
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     
  • Instead of calling fsnotify() twice, once with parent inode and once
    with child inode, if event should be sent to parent inode, send it
    with both parent and child inodes marks in object type iterator and call
    the backend handle_event() callback only once.

    The parent inode is assigned to the standard "inode" iterator type and
    the child inode is assigned to the special "child" iterator type.

    In that case, the bit FS_EVENT_ON_CHILD will be set in the event mask,
    the dir argument to handle_event will be the parent inode, the file_name
    argument to handle_event is non NULL and refers to the name of the child
    and the child inode can be accessed with fsnotify_data_inode().

    This will allow fanotify to make decisions based on child or parent's
    ignored mask. For example, when a parent is interested in a specific
    event on its children, but a specific child wishes to ignore this event,
    the event will not be reported. This is not what happens with current
    code, but according to man page, it is the expected behavior.

    Link: https://lore.kernel.org/r/20200716084230.30611-15-amir73il@gmail.com
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     
  • The fanotify_fh struct has an inline buffer of size 12 which is enough
    to store the most common local filesystem file handles (e.g. ext4, xfs).
    For file handles that do not fit in the inline buffer (e.g. btrfs), an
    external buffer is allocated to store the file handle.

    When allocating a variable size fanotify_name_event, there is no point
    in allocating also an external fh buffer when file handle does not fit
    in the inline buffer.

    Check required size for encoding fh, preallocate an event buffer
    sufficient to contain both file handle and name and store the name after
    the file handle.

    At this time, when not reporting name in event, we still allocate
    the fixed size fanotify_fid_event and an external buffer for large
    file handles, but fanotify_alloc_name_event() has already been prepared
    to accept a NULL file_name.

    Link: https://lore.kernel.org/r/20200716084230.30611-11-amir73il@gmail.com
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     
  • An fanotify event name is always recorded relative to a dir fh.
    Encapsulate the name_len member of fanotify_name_event in a new struct
    fanotify_info, which describes the parceling of the variable size
    buffer of an fanotify_name_event.

    The dir_fh member of fanotify_name_event is renamed to _dir_fh and is not
    accessed directly, but via the fanotify_info_dir_fh() accessor.
    Although the dir_fh len information is already available in struct
    fanotify_fh, we store it also in dif_fh_totlen member of fanotify_info,
    including the size of fanotify_fh header, so we know the offset of the
    name in the buffer without looking inside the dir_fh.

    We also add a file_fh_totlen member to allow packing another file handle
    in the variable size buffer after the dir_fh and before the name.
    We are going to use that space to store the child fid.

    Link: https://lore.kernel.org/r/20200716084230.30611-10-amir73il@gmail.com
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     
  • Up to now, fanotify allowed to set the FAN_EVENT_ON_CHILD flag on
    sb/mount marks and non-directory inode mask, but the flag was ignored.

    Mask out the flag if it is provided by user on sb/mount/non-dir marks
    and define it as an implicit flag that cannot be removed by user.

    This flag is going to be used internally to request for events with
    parent and name info.

    Link: https://lore.kernel.org/r/20200716084230.30611-8-amir73il@gmail.com
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     
  • So far, all flags that can be set in an fanotify mark mask can be set
    explicitly by a call to fanotify_mark(2).

    Prepare for defining implicit event flags that cannot be set by user with
    fanotify_mark(2), similar to how inotify/dnotify implicitly set the
    FS_EVENT_ON_CHILD flag.

    Implicit event flags cannot be removed by user and mark gets destroyed
    when only implicit event flags remain in the mask.

    Link: https://lore.kernel.org/r/20200716084230.30611-7-amir73il@gmail.com
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     
  • The special event flags (FAN_ONDIR, FAN_EVENT_ON_CHILD) never had
    any meaning in ignored mask. Mask them out explicitly.

    Link: https://lore.kernel.org/r/20200716084230.30611-6-amir73il@gmail.com
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     
  • As preparation for new flags that report fids, define a bit set
    of flags for a group reporting fids, currently containing the
    only bit FAN_REPORT_FID.

    Link: https://lore.kernel.org/r/20200716084230.30611-5-amir73il@gmail.com
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     
  • In fanotify_encode_fh(), both cases of NULL inode and failure to encode
    ended up with fh type FILEID_INVALID.

    Distiguish the case of NULL inode, by setting fh type to FILEID_ROOT.
    This is just a semantic difference at this point.

    Remove stale comment and unneeded check from fid event compare helpers.

    Link: https://lore.kernel.org/r/20200716084230.30611-4-amir73il@gmail.com
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     
  • An event on directory should never be merged with an event on
    non-directory regardless of the event struct type.

    This change has no visible effect, because currently, with struct
    fanotify_path_event, the relevant events will not be merged because
    event path of dir will be different than event path of non-dir.

    Link: https://lore.kernel.org/r/20200716084230.30611-3-amir73il@gmail.com
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     
  • In fanotify_group_event_mask() there is logic in place to make sure we
    are not going to handle an event with no type and just FAN_ONDIR flag.
    Generalize this logic to any FANOTIFY_EVENT_FLAGS.

    There is only one more flag in this group at the moment -
    FAN_EVENT_ON_CHILD. We never report it to user, but we do pass it in to
    fanotify_alloc_event() when group is reporting fid as indication that
    event happened on child. We will have use for this indication later on.

    Link: https://lore.kernel.org/r/20200716084230.30611-2-amir73il@gmail.com
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     
  • It was never enabled in uapi and its functionality is about to be
    superseded by events FAN_CREATE, FAN_DELETE, FAN_MOVE with group
    flag FAN_REPORT_NAME.

    Keep a place holder variable name_event instead of removing the
    name recording code since it will be used by the new events.

    Link: https://lore.kernel.org/r/20200708111156.24659-17-amir73il@gmail.com
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     
  • The 'inode' argument to handle_event(), sometimes referred to as
    'to_tell' is somewhat obsolete.
    It is a remnant from the times when a group could only have an inode mark
    associated with an event.

    We now pass an iter_info array to the callback, with all marks associated
    with an event.

    Most backends ignore this argument, with two exceptions:
    1. dnotify uses it for sanity check that event is on directory
    2. fanotify uses it to report fid of directory on directory entry
    modification events

    Remove the 'inode' argument and add a 'dir' argument.
    The callback function signature is deliberately changed, because
    the meaning of the argument has changed and the arguments have
    been documented.

    The 'dir' argument is set to when 'file_name' is specified and it is
    referring to the directory that the 'file_name' entry belongs to.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     

15 Jul, 2020

4 commits

  • Break up fanotify_alloc_event() into helpers by event struct type.

    Suggested-by: Jan Kara
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     
  • The special overflow event is allocated as struct fanotify_path_event,
    but with a null path.

    Use a special event type to identify the overflow event, so the helper
    fanotify_has_event_path() will always indicate a non null path.

    Allocating the overflow event doesn't need any of the fancy stuff in
    fanotify_alloc_event(), so create a simplified helper for allocating the
    overflow event.

    There is also no need to store and report the pid with an overflow event.

    Link: https://lore.kernel.org/r/20200708111156.24659-7-amir73il@gmail.com
    Suggested-by: Jan Kara
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     
  • Return non const inode pointer from fsnotify_data_inode().
    None of the fsnotify hooks pass const inode pointer as data and
    callers often need to cast to a non const pointer.

    Link: https://lore.kernel.org/r/20200708111156.24659-3-amir73il@gmail.com
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     
  • When user provides large buffer for events and there are lots of events
    available, we can try to copy them all to userspace without scheduling
    which can softlockup the kernel (furthermore exacerbated by the
    contention on notification_lock). Add a scheduling point after copying
    each event.

    Note that usually the real underlying problem is the cost of fanotify
    event merging and the resulting contention on notification_lock but this
    is a cheap way to somewhat reduce the problem until we can properly
    address that.

    Reported-by: Francesco Ruggeri
    Link: https://lore.kernel.org/lkml/20200714025417.A25EB95C0339@us180.sjc.aristanetworks.com
    Reviewed-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Jan Kara
     

14 Jun, 2020

1 commit

  • Since commit 84af7a6194e4 ("checkpatch: kconfig: prefer 'help' over
    '---help---'"), the number of '---help---' has been gradually
    decreasing, but there are still more than 2400 instances.

    This commit finishes the conversion. While I touched the lines,
    I also fixed the indentation.

    There are a variety of indentation styles found.

    a) 4 spaces + '---help---'
    b) 7 spaces + '---help---'
    c) 8 spaces + '---help---'
    d) 1 space + 1 tab + '---help---'
    e) 1 tab + '---help---' (correct indentation)
    f) 1 tab + 1 space + '---help---'
    g) 1 tab + 2 spaces + '---help---'

    In order to convert all of them to 1 tab + 'help', I ran the
    following commend:

    $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'

    Signed-off-by: Masahiro Yamada

    Masahiro Yamada
     

05 Jun, 2020

1 commit

  • Pull fsnotify updates from Jan Kara:
    "Several smaller fixes and cleanups for fsnotify subsystem"

    * tag 'fsnotify_for_v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    fanotify: fix ignore mask logic for events on child and on dir
    fanotify: don't write with size under sizeof(response)
    fsnotify: Remove proc_fs.h include
    fanotify: remove reference to fill_event_metadata()
    fsnotify: add mutex destroy
    fanotify: prefix should_merge()
    fanotify: Replace zero-length array with flexible-array
    inotify: Fix error return code assignment flow.
    fsnotify: Add missing annotation for fsnotify_finish_user_wait() and for fsnotify_prepare_user_wait()

    Linus Torvalds
     

28 May, 2020

1 commit

  • FAN_DIR_MODIFY has been enabled by commit 44d705b0370b ("fanotify:
    report name info for FAN_DIR_MODIFY event") in 5.7-rc1. Now we are
    planning further extensions to the fanotify API and during that we
    realized that FAN_DIR_MODIFY may behave slightly differently to be more
    consistent with extensions we plan. So until we finalize these
    extensions, let's not bind our hands with exposing FAN_DIR_MODIFY to
    userland.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     

25 May, 2020

1 commit

  • The comments in fanotify_group_event_mask() say:

    "If the event is on dir/child and this mark doesn't care about
    events on dir/child, don't send it!"

    Specifically, mount and filesystem marks do not care about events
    on child, but they can still specify an ignore mask for those events.
    For example, a group that has:
    - A mount mark with mask 0 and ignore_mask FAN_OPEN
    - An inode mark on a directory with mask FAN_OPEN | FAN_OPEN_EXEC
    with flag FAN_EVENT_ON_CHILD

    A child file open for exec would be reported to group with the FAN_OPEN
    event despite the fact that FAN_OPEN is in ignore mask of mount mark,
    because the mark iteration loop skips over non-inode marks for events
    on child when calculating the ignore mask.

    Move ignore mask calculation to the top of the iteration loop block
    before excluding marks for events on dir/child.

    Link: https://lore.kernel.org/r/20200524072441.18258-1-amir73il@gmail.com
    Reported-by: Jan Kara
    Link: https://lore.kernel.org/linux-fsdevel/20200521162443.GA26052@quack2.suse.cz/
    Fixes: 55bf882c7f13 "fanotify: fix merging marks masks with FAN_ONDIR"
    Fixes: b469e7e47c8a "fanotify: fix handling of events on child..."
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     

13 May, 2020

3 commits


08 May, 2020

1 commit

  • The current codebase makes use of the zero-length array language
    extension to the C90 standard, but the preferred mechanism to declare
    variable-length types such as these ones is a flexible array member[1][2],
    introduced in C99:

    struct foo {
    int stuff;
    struct boo array[];
    };

    By making use of the mechanism above, we will get a compiler warning
    in case the flexible array does not occur last in the structure, which
    will help us prevent some kind of undefined behavior bugs from being
    inadvertently introduced[3] to the codebase from now on.

    Also, notice that, dynamic memory allocations won't be affected by
    this change:

    "Flexible array members have incomplete type, and so the sizeof operator
    may not be applied. As a quirk of the original implementation of
    zero-length arrays, sizeof evaluates to zero."[1]

    sizeof(flexible-array-member) triggers a warning because flexible array
    members have incomplete type[1]. There are some instances of code in
    which the sizeof operator is being incorrectly/erroneously applied to
    zero-length arrays and the result is zero. Such instances may be hiding
    some bugs. So, this work (flexible-array member conversions) will also
    help to get completely rid of those sorts of issues.

    This issue was found with the help of Coccinelle.

    [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
    [2] https://github.com/KSPP/linux/issues/21
    [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

    Link: https://lore.kernel.org/r/20200507185230.GA14229@embeddedor
    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: Jan Kara

    Gustavo A. R. Silva
     

30 Mar, 2020

1 commit

  • Clang warns:

    fs/notify/fanotify/fanotify.c:28:23: warning: self-comparison always
    evaluates to true [-Wtautological-compare]
    return fsid1->val[0] == fsid1->val[0] && fsid2->val[1] == fsid2->val[1];
    ^
    fs/notify/fanotify/fanotify.c:28:57: warning: self-comparison always
    evaluates to true [-Wtautological-compare]
    return fsid1->val[0] == fsid1->val[0] && fsid2->val[1] == fsid2->val[1];
    ^
    2 warnings generated.

    The intention was clearly to compare val[0] and val[1] in the two
    different fsid structs. Fix it otherwise this function always returns
    true.

    Fixes: afc894c784c8 ("fanotify: Store fanotify handles differently")
    Link: https://github.com/ClangBuiltLinux/linux/issues/952
    Link: https://lore.kernel.org/r/20200327171030.30625-1-natechancellor@gmail.com
    Signed-off-by: Nathan Chancellor
    Reviewed-by: Nick Desaulniers
    Signed-off-by: Jan Kara

    Nathan Chancellor
     

26 Mar, 2020

2 commits

  • Report event FAN_DIR_MODIFY with name in a variable length record similar
    to how fid's are reported. With name info reporting implemented, setting
    FAN_DIR_MODIFY in mark mask is now allowed.

    When events are reported with name, the reported fid identifies the
    directory and the name follows the fid. The info record type for this
    event info is FAN_EVENT_INFO_TYPE_DFID_NAME.

    For now, all reported events have at most one info record which is
    either FAN_EVENT_INFO_TYPE_FID or FAN_EVENT_INFO_TYPE_DFID_NAME (for
    FAN_DIR_MODIFY). Later on, events "on child" will report both records.

    There are several ways that an application can use this information:

    1. When watching a single directory, the name is always relative to
    the watched directory, so application need to fstatat(2) the name
    relative to the watched directory.

    2. When watching a set of directories, the application could keep a map
    of dirfd for all watched directories and hash the map by fid obtained
    with name_to_handle_at(2). When getting a name event, the fid in the
    event info could be used to lookup the base dirfd in the map and then
    call fstatat(2) with that dirfd.

    3. When watching a filesystem (FAN_MARK_FILESYSTEM) or a large set of
    directories, the application could use open_by_handle_at(2) with the fid
    in event info to obtain dirfd for the directory where event happened and
    call fstatat(2) with this dirfd.

    The last option scales better for a large number of watched directories.
    The first two options may be available in the future also for non
    privileged fanotify watchers, because open_by_handle_at(2) requires
    the CAP_DAC_READ_SEARCH capability.

    Link: https://lore.kernel.org/r/20200319151022.31456-15-amir73il@gmail.com
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     
  • For FAN_DIR_MODIFY event, allocate a variable size event struct to store
    the dir entry name along side the directory file handle.

    At this point, name info reporting is not yet implemented, so trying to
    set FAN_DIR_MODIFY in mark mask will return -EINVAL.

    Link: https://lore.kernel.org/r/20200319151022.31456-14-amir73il@gmail.com
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     

25 Mar, 2020

3 commits

  • When some events have directory id and some object id,
    fanotify_event_has_fid() becomes mostly useless and confusing because we
    usually need to know which type of file handle the event has. So just
    drop the function and use fanotify_event_object_fh() instead.

    Signed-off-by: Jan Kara

    Jan Kara
     
  • For some events, we are going to report both child and parent fid's,
    so pass fsid and file handle as arguments to copy_fid_to_user(),
    which is going to be called with parent and child file handles.

    Link: https://lore.kernel.org/r/20200319151022.31456-13-amir73il@gmail.com
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     
  • Dirent events are going to be supported in two flavors:

    1. Directory fid info + mask that includes the specific event types
    (e.g. FAN_CREATE) and an optional FAN_ONDIR flag.
    2. Directory fid info + name + mask that includes only FAN_DIR_MODIFY.

    To request the second event flavor, user needs to set the event type
    FAN_DIR_MODIFY in the mark mask.

    The first flavor is supported since kernel v5.1 for groups initialized
    with flag FAN_REPORT_FID. It is intended to be used for watching
    directories in "batch mode" - the watcher is notified when directory is
    changed and re-scans the directory content in response. This event
    flavor is stored more compactly in the event queue, so it is optimal
    for workloads with frequent directory changes.

    The second event flavor is intended to be used for watching large
    directories, where the cost of re-scan of the directory on every change
    is considered too high. The watcher getting the event with the directory
    fid and entry name is expected to call fstatat(2) to query the content of
    the entry after the change.

    Legacy inotify events are reported with name and event mask (e.g. "foo",
    FAN_CREATE | FAN_ONDIR). That can lead users to the conclusion that
    there is *currently* an entry "foo" that is a sub-directory, when in fact
    "foo" may be negative or non-dir by the time user gets the event.

    To make it clear that the current state of the named entry is unknown,
    when reporting an event with name info, fanotify obfuscates the specific
    event types (e.g. create,delete,rename) and uses a common event type -
    FAN_DIR_MODIFY to describe the change. This should make it harder for
    users to make wrong assumptions and write buggy filesystem monitors.

    At this point, name info reporting is not yet implemented, so trying to
    set FAN_DIR_MODIFY in mark mask will return -EINVAL.

    Link: https://lore.kernel.org/r/20200319151022.31456-12-amir73il@gmail.com
    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein