28 Jul, 2010

4 commits


16 Nov, 2008

1 commit

  • Inotify watch removals suck violently.

    To kick the watch out we need (in this order) inode->inotify_mutex and
    ih->mutex. That's fine if we have a hold on inode; however, for all
    other cases we need to make damn sure we don't race with umount. We can
    *NOT* just grab a reference to a watch - inotify_unmount_inodes() will
    happily sail past it and we'll end with reference to inode potentially
    outliving its superblock.

    Ideally we just want to grab an active reference to superblock if we
    can; that will make sure we won't go into inotify_umount_inodes() until
    we are done. Cleanup is just deactivate_super().

    However, that leaves a messy case - what if we *are* racing with
    umount() and active references to superblock can't be acquired anymore?
    We can bump ->s_count, grab ->s_umount, which will almost certainly wait
    until the superblock is shut down and the watch in question is pining
    for fjords. That's fine, but there is a problem - we might have hit the
    window between ->s_active getting to 0 / ->s_count - below S_BIAS (i.e.
    the moment when superblock is past the point of no return and is heading
    for shutdown) and the moment when deactivate_super() acquires
    ->s_umount.

    We could just do drop_super() yield() and retry, but that's rather
    antisocial and this stuff is luser-triggerable. OTOH, having grabbed
    ->s_umount and having found that we'd got there first (i.e. that
    ->s_root is non-NULL) we know that we won't race with
    inotify_umount_inodes().

    So we could grab a reference to watch and do the rest as above, just
    with drop_super() instead of deactivate_super(), right? Wrong. We had
    to drop ih->mutex before we could grab ->s_umount. So the watch
    could've been gone already.

    That still can be dealt with - we need to save watch->wd, do idr_find()
    and compare its result with our pointer. If they match, we either have
    the damn thing still alive or we'd lost not one but two races at once,
    the watch had been killed and a new one got created with the same ->wd
    at the same address. That couldn't have happened in inotify_destroy(),
    but inotify_rm_wd() could run into that. Still, "new one got created"
    is not a problem - we have every right to kill it or leave it alone,
    whatever's more convenient.

    So we can use idr_find(...) == watch && watch->inode->i_sb == sb as
    "grab it and kill it" check. If it's been our original watch, we are
    fine, if it's a newcomer - nevermind, just pretend that we'd won the
    race and kill the fscker anyway; we are safe since we know that its
    superblock won't be going away.

    And yes, this is far beyond mere "not very pretty"; so's the entire
    concept of inotify to start with.

    Signed-off-by: Al Viro
    Acked-by: Greg KH
    Signed-off-by: Linus Torvalds

    Al Viro
     

25 Jul, 2008

2 commits

  • This patch adds non-blocking support for inotify_init1. The
    additional changes needed are minimal.

    The following test must be adjusted for architectures other than x86 and
    x86-64 and in case the syscall numbers changed.

    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    #include
    #include
    #include
    #include

    #ifndef __NR_inotify_init1
    # ifdef __x86_64__
    # define __NR_inotify_init1 294
    # elif defined __i386__
    # define __NR_inotify_init1 332
    # else
    # error "need __NR_inotify_init1"
    # endif
    #endif

    #define IN_NONBLOCK O_NONBLOCK

    int
    main (void)
    {
    int fd = syscall (__NR_inotify_init1, 0);
    if (fd == -1)
    {
    puts ("inotify_init1(0) failed");
    return 1;
    }
    int fl = fcntl (fd, F_GETFL);
    if (fl == -1)
    {
    puts ("fcntl failed");
    return 1;
    }
    if (fl & O_NONBLOCK)
    {
    puts ("inotify_init1(0) set non-blocking mode");
    return 1;
    }
    close (fd);

    fd = syscall (__NR_inotify_init1, IN_NONBLOCK);
    if (fd == -1)
    {
    puts ("inotify_init1(IN_NONBLOCK) failed");
    return 1;
    }
    fl = fcntl (fd, F_GETFL);
    if (fl == -1)
    {
    puts ("fcntl failed");
    return 1;
    }
    if ((fl & O_NONBLOCK) == 0)
    {
    puts ("inotify_init1(IN_NONBLOCK) set non-blocking mode");
    return 1;
    }
    close (fd);

    puts ("OK");

    return 0;
    }
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    Signed-off-by: Ulrich Drepper
    Acked-by: Davide Libenzi
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Drepper
     
  • This patch introduces the new syscall inotify_init1 (note: the 1 stands for
    the one parameter the syscall takes, as opposed to no parameter before). The
    values accepted for this parameter are function-specific and defined in the
    inotify.h header. Here the values must match the O_* flags, though. In this
    patch CLOEXEC support is introduced.

    The following test must be adjusted for architectures other than x86 and
    x86-64 and in case the syscall numbers changed.

    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    #include
    #include
    #include
    #include

    #ifndef __NR_inotify_init1
    # ifdef __x86_64__
    # define __NR_inotify_init1 294
    # elif defined __i386__
    # define __NR_inotify_init1 332
    # else
    # error "need __NR_inotify_init1"
    # endif
    #endif

    #define IN_CLOEXEC O_CLOEXEC

    int
    main (void)
    {
    int fd;
    fd = syscall (__NR_inotify_init1, 0);
    if (fd == -1)
    {
    puts ("inotify_init1(0) failed");
    return 1;
    }
    int coe = fcntl (fd, F_GETFD);
    if (coe == -1)
    {
    puts ("fcntl failed");
    return 1;
    }
    if (coe & FD_CLOEXEC)
    {
    puts ("inotify_init1(0) set close-on-exit");
    return 1;
    }
    close (fd);

    fd = syscall (__NR_inotify_init1, IN_CLOEXEC);
    if (fd == -1)
    {
    puts ("inotify_init1(IN_CLOEXEC) failed");
    return 1;
    }
    coe = fcntl (fd, F_GETFD);
    if (coe == -1)
    {
    puts ("fcntl failed");
    return 1;
    }
    if ((coe & FD_CLOEXEC) == 0)
    {
    puts ("inotify_init1(O_CLOEXEC) does not set close-on-exit");
    return 1;
    }
    close (fd);

    puts ("OK");

    return 0;
    }
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    [akpm@linux-foundation.org: add sys_ni stub]
    Signed-off-by: Ulrich Drepper
    Acked-by: Davide Libenzi
    Cc: Michael Kerrisk
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Drepper
     

21 Oct, 2007

2 commits


20 Jun, 2006

5 commits

  • Allow callers to remove watches from their event handler via
    inotify_remove_watch_locked(). This functionality can be used to
    achieve IN_ONESHOT-like functionality for a subset of events in the
    mask.

    Signed-off-by: Amy Griffis
    Acked-by: Robert Love
    Acked-by: John McCutchan
    Signed-off-by: Al Viro

    Amy Griffis
     
  • Add inotify_init_watch() so caller can use inotify_watch refcounts
    before calling inotify_add_watch().

    Add inotify_find_watch() to find an existing watch for an (ih,inode)
    pair. This is similar to inotify_find_update_watch(), but does not
    update the watch's mask if one is found.

    Add inotify_rm_watch() to remove a watch via the watch pointer instead
    of the watch descriptor.

    Signed-off-by: Amy Griffis
    Acked-by: Robert Love
    Acked-by: John McCutchan
    Signed-off-by: Al Viro

    Amy Griffis
     
  • When an inotify event includes a dentry name, also include the inode
    associated with that name.

    Signed-off-by: Amy Griffis
    Acked-by: Robert Love
    Acked-by: John McCutchan
    Signed-off-by: Al Viro

    Amy Griffis
     
  • The following series of patches introduces a kernel API for inotify,
    making it possible for kernel modules to benefit from inotify's
    mechanism for watching inodes. With these patches, inotify will
    maintain for each caller a list of watches (via an embedded struct
    inotify_watch), where each inotify_watch is associated with a
    corresponding struct inode. The caller registers an event handler and
    specifies for which filesystem events their event handler should be
    called per inotify_watch.

    Signed-off-by: Amy Griffis
    Acked-by: Robert Love
    Acked-by: John McCutchan
    Signed-off-by: Al Viro

    Amy Griffis
     
  • Signed-off-by: Al Viro

    Al Viro
     

26 Mar, 2006

1 commit

  • Previous inotify work avoidance is good when inotify is completely unused,
    but it breaks down if even a single watch is in place anywhere in the
    system. Robin Holt notices that udev is one such culprit - it slows down a
    512-thread application on a 512 CPU system from 6 seconds to 22 minutes.

    Solve this by adding a flag in the dentry that tells inotify whether or not
    its parent inode has a watch on it. Event queueing to parent will skip
    taking locks if this flag is cleared. Setting and clearing of this flag on
    all child dentries versus event delivery: this is no in terms of race
    cases, and that was shown to be equivalent to always performing the check.

    The essential behaviour is that activity occuring _after_ a watch has been
    added and _before_ it has been removed, will generate events.

    Signed-off-by: Nick Piggin
    Cc: Robert Love
    Cc: John McCutchan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

13 Dec, 2005

1 commit

  • The below patch lets userspace have more control over the inodes that
    inotify will watch. It introduces two new flags.

    IN_ONLYDIR -- only watch the inode if it is a directory.
    This is needed to avoid the race that can occur when we want to be
    sure that we are watching a directory.

    IN_DONT_FOLLOW -- don't follow a symlink. In combination
    with IN_ONLYDIR we can make sure that we don't watch the target of
    symlinks.

    The issues the flags fix came up when writing the gnome-vfs inotify
    backend. Default behaviour is unchanged.

    Signed-off-by: John McCutchan
    Acked-by: Robert Love
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    John McCutchan
     

08 Sep, 2005

1 commit

  • People have run into a problem when they do this:

    watch (file1, all_events);
    watch (file2, some_events);

    if file2 is a hard link to file1, some events will be missed because by
    default we replace the mask. The patch below adds a flag IN_MASK_ADD which
    will cause inotify to add to the existing mask if present.

    Signed-off-by: John McCutchan
    Signed-off-by: Robert Love
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    John McCutchan
     

16 Aug, 2005

1 commit

  • This adds a MOVE_SELF event to inotify. It is sent whenever the inode
    you are watching is moved. We need this event so that we can catch
    something like this:

    - app1:
    watch /etc/mtab

    - app2:
    cp /etc/mtab /tmp/mtab-work
    mv /etc/mtab /etc/mtab~
    mv /tmp/mtab-work /etc/mtab

    app1 still thinks it's watching /etc/mtab but it's actually watching
    /etc/mtab~.

    Signed-off-by: John McCutchan
    Signed-off-by: Robert Love
    Signed-off-by: Linus Torvalds

    John McCutchan
     

13 Jul, 2005

1 commit

  • inotify is intended to correct the deficiencies of dnotify, particularly
    its inability to scale and its terrible user interface:

    * dnotify requires the opening of one fd per each directory
    that you intend to watch. This quickly results in too many
    open files and pins removable media, preventing unmount.
    * dnotify is directory-based. You only learn about changes to
    directories. Sure, a change to a file in a directory affects
    the directory, but you are then forced to keep a cache of
    stat structures.
    * dnotify's interface to user-space is awful. Signals?

    inotify provides a more usable, simple, powerful solution to file change
    notification:

    * inotify's interface is a system call that returns a fd, not SIGIO.
    You get a single fd, which is select()-able.
    * inotify has an event that says "the filesystem that the item
    you were watching is on was unmounted."
    * inotify can watch directories or files.

    Inotify is currently used by Beagle (a desktop search infrastructure),
    Gamin (a FAM replacement), and other projects.

    See Documentation/filesystems/inotify.txt.

    Signed-off-by: Robert Love
    Cc: John McCutchan
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert Love