11 Dec, 2008

1 commit

  • On umount two event will be dispatched to watcher:

    1: inotify_dev_queue_event(.., IN_UNMOUNT,..)
    2: remove_watch(watch, dev)
    ->inotify_dev_queue_event(.., IN_IGNORED, ..)

    But if watcher has IN_ONESHOT bit set then the watcher will be released
    inside first event. Which result in accessing invalid object later. IMHO
    it is not pure regression. This bug wasn't triggered while initial
    inotify interface testing phase because of another bug in IN_ONESHOT
    handling logic :)

    commit ac74c00e499ed276a965e5b5600667d5dc04a84a
    Author: Ulisses Furquim
    Date: Fri Feb 8 04:18:16 2008 -0800
    inotify: fix check for one-shot watches before destroying them
    As the IN_ONESHOT bit is never set when an event is sent we must check it
    in the watch's mask and not in the event's mask.

    TESTCASE:
    mkdir mnt
    mount -ttmpfs none mnt
    mkdir mnt/d
    ./inotify mnt/d&
    umount mnt ## << lockup or crash here

    TESTSOURCE:
    /* gcc -oinotify inotify.c */
    #include
    #include
    #include

    int main(int argc, char **argv)
    {
    char buf[1024];
    struct inotify_event *ie;
    char *p;
    int i;
    ssize_t l;

    p = argv[1];
    i = inotify_init();
    inotify_add_watch(i, p, ~0);

    l = read(i, buf, sizeof(buf));
    printf("read %d bytes\n", l);
    ie = (struct inotify_event *) buf;
    printf("event mask: %d\n", ie->mask);
    return 0;
    }

    Signed-off-by: Dmitri Monakhov
    Cc: John McCutchan
    Cc: Al Viro
    Cc: Robert Love
    Cc: Ulisses Furquim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitri Monakhov
     

16 Nov, 2008

1 commit

  • Inotify watch removals suck violently.

    To kick the watch out we need (in this order) inode->inotify_mutex and
    ih->mutex. That's fine if we have a hold on inode; however, for all
    other cases we need to make damn sure we don't race with umount. We can
    *NOT* just grab a reference to a watch - inotify_unmount_inodes() will
    happily sail past it and we'll end with reference to inode potentially
    outliving its superblock.

    Ideally we just want to grab an active reference to superblock if we
    can; that will make sure we won't go into inotify_umount_inodes() until
    we are done. Cleanup is just deactivate_super().

    However, that leaves a messy case - what if we *are* racing with
    umount() and active references to superblock can't be acquired anymore?
    We can bump ->s_count, grab ->s_umount, which will almost certainly wait
    until the superblock is shut down and the watch in question is pining
    for fjords. That's fine, but there is a problem - we might have hit the
    window between ->s_active getting to 0 / ->s_count - below S_BIAS (i.e.
    the moment when superblock is past the point of no return and is heading
    for shutdown) and the moment when deactivate_super() acquires
    ->s_umount.

    We could just do drop_super() yield() and retry, but that's rather
    antisocial and this stuff is luser-triggerable. OTOH, having grabbed
    ->s_umount and having found that we'd got there first (i.e. that
    ->s_root is non-NULL) we know that we won't race with
    inotify_umount_inodes().

    So we could grab a reference to watch and do the rest as above, just
    with drop_super() instead of deactivate_super(), right? Wrong. We had
    to drop ih->mutex before we could grab ->s_umount. So the watch
    could've been gone already.

    That still can be dealt with - we need to save watch->wd, do idr_find()
    and compare its result with our pointer. If they match, we either have
    the damn thing still alive or we'd lost not one but two races at once,
    the watch had been killed and a new one got created with the same ->wd
    at the same address. That couldn't have happened in inotify_destroy(),
    but inotify_rm_wd() could run into that. Still, "new one got created"
    is not a problem - we have every right to kill it or leave it alone,
    whatever's more convenient.

    So we can use idr_find(...) == watch && watch->inode->i_sb == sb as
    "grab it and kill it" check. If it's been our original watch, we are
    fine, if it's a newcomer - nevermind, just pretend that we'd won the
    race and kill the fscker anyway; we are safe since we know that its
    superblock won't be going away.

    And yes, this is far beyond mere "not very pretty"; so's the entire
    concept of inotify to start with.

    Signed-off-by: Al Viro
    Acked-by: Greg KH
    Signed-off-by: Linus Torvalds

    Al Viro
     

07 Feb, 2008

2 commits

  • The inotify debugging code is supposed to verify that the
    DCACHE_INOTIFY_PARENT_WATCHED scalability optimisation does not result in
    notifications getting lost nor extra needless locking generated.

    Unfortunately there are also some races in the debugging code. And it isn't
    very good at finding problems anyway. So remove it for now.

    Signed-off-by: Nick Piggin
    Cc: Robert Love
    Cc: John McCutchan
    Cc: Jan Kara
    Cc: Yan Zheng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • There is a race between setting an inode's children's "parent watched" flag
    when placing the first watch on a parent, and instantiating new children of
    that parent: a child could miss having its flags set by
    set_dentry_child_flags, but then inotify_d_instantiate might still see
    !inotify_inode_watched.

    The solution is to set_dentry_child_flags after adding the watch. Locking is
    taken care of, because both set_dentry_child_flags and inotify_d_instantiate
    hold dcache_lock and child->d_locks.

    Signed-off-by: Nick Piggin
    Cc: Robert Love
    Cc: John McCutchan
    Cc: Jan Kara
    Cc: Yan Zheng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

21 Oct, 2007

2 commits


09 May, 2007

1 commit

  • There are many places in the kernel where the construction like

    foo = list_entry(head->next, struct foo_struct, list);

    are used.
    The code might look more descriptive and neat if using the macro

    list_first_entry(head, type, member) \
    list_entry((head)->next, type, member)

    Here is the macro itself and the examples of its usage in the generic code.
    If it will turn out to be useful, I can prepare the set of patches to
    inject in into arch-specific code, drivers, networking, etc.

    Signed-off-by: Pavel Emelianov
    Signed-off-by: Kirill Korotaev
    Cc: Randy Dunlap
    Cc: Andi Kleen
    Cc: Zach Brown
    Cc: Davide Libenzi
    Cc: John McCutchan
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: john stultz
    Cc: Ram Pai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelianov
     

04 Dec, 2006

1 commit


20 Jun, 2006

4 commits

  • Allow callers to remove watches from their event handler via
    inotify_remove_watch_locked(). This functionality can be used to
    achieve IN_ONESHOT-like functionality for a subset of events in the
    mask.

    Signed-off-by: Amy Griffis
    Acked-by: Robert Love
    Acked-by: John McCutchan
    Signed-off-by: Al Viro

    Amy Griffis
     
  • Add inotify_init_watch() so caller can use inotify_watch refcounts
    before calling inotify_add_watch().

    Add inotify_find_watch() to find an existing watch for an (ih,inode)
    pair. This is similar to inotify_find_update_watch(), but does not
    update the watch's mask if one is found.

    Add inotify_rm_watch() to remove a watch via the watch pointer instead
    of the watch descriptor.

    Signed-off-by: Amy Griffis
    Acked-by: Robert Love
    Acked-by: John McCutchan
    Signed-off-by: Al Viro

    Amy Griffis
     
  • When an inotify event includes a dentry name, also include the inode
    associated with that name.

    Signed-off-by: Amy Griffis
    Acked-by: Robert Love
    Acked-by: John McCutchan
    Signed-off-by: Al Viro

    Amy Griffis
     
  • The following series of patches introduces a kernel API for inotify,
    making it possible for kernel modules to benefit from inotify's
    mechanism for watching inodes. With these patches, inotify will
    maintain for each caller a list of watches (via an embedded struct
    inotify_watch), where each inotify_watch is associated with a
    corresponding struct inode. The caller registers an event handler and
    specifies for which filesystem events their event handler should be
    called per inotify_watch.

    Signed-off-by: Amy Griffis
    Acked-by: Robert Love
    Acked-by: John McCutchan
    Signed-off-by: Al Viro

    Amy Griffis
     

22 May, 2006

2 commits

  • Don't reassign to watch. If idr_find() returns NULL, then
    put_inotify_watch() will choke.

    Signed-off-by: Amy Griffis
    Cc: John McCutchan
    Cc: Robert Love
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Amy Griffis
     
  • While doing some inotify stress testing, I hit the following race. In
    inotify_release(), it's possible for a watch to be removed from the lists
    in between dropping dev->mutex and taking inode->inotify_mutex. The
    reference we hold prevents the watch from being freed, but not from being
    removed.

    Checking the dev's idr mapping will prevent a double list_del of the
    same watch.

    Signed-off-by: Amy Griffis
    Acked-by: John McCutchan
    Cc: Robert Love
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Amy Griffis
     

11 Apr, 2006

1 commit

  • The spufs file system creates files in a directory before instantiating the
    directory itself, which causes a NULL pointer access in
    inotify_d_instantiate since c32ccd87bfd1414b0aabfcd8dbc7539ad23bcbaa.

    I'd like to keep this behavior since it means that the user will not have
    access to files in the directory before I know that I succeed in creating
    everything in it. This patch adds a simple check for the inode to keep
    that working.

    Signed-off-by: Arnd Bergmann
    Acked-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arnd Bergmann
     

29 Mar, 2006

1 commit

  • This is a conversion to make the various file_operations structs in fs/
    const. Basically a regexp job, with a few manual fixups

    The goal is both to increase correctness (harder to accidentally write to
    shared datastructures) and reducing the false sharing of cachelines with
    things that get dirty in .data (while .rodata is nicely read only and thus
    cache clean)

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

27 Mar, 2006

1 commit

  • I discovered on oprofile hunting on a SMP platform that dentry lookups were
    slowed down because d_hash_mask, d_hash_shift and dentry_hashtable were in
    a cache line that contained inodes_stat. So each time inodes_stats is
    changed by a cpu, other cpus have to refill their cache line.

    This patch moves some variables to the __read_mostly section, in order to
    avoid false sharing. RCU dentry lookups can go full speed.

    Signed-off-by: Eric Dumazet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

26 Mar, 2006

1 commit

  • Previous inotify work avoidance is good when inotify is completely unused,
    but it breaks down if even a single watch is in place anywhere in the
    system. Robin Holt notices that udev is one such culprit - it slows down a
    512-thread application on a 512 CPU system from 6 seconds to 22 minutes.

    Solve this by adding a flag in the dentry that tells inotify whether or not
    its parent inode has a watch on it. Event queueing to parent will skip
    taking locks if this flag is cleared. Setting and clearing of this flag on
    all child dentries versus event delivery: this is no in terms of race
    cases, and that was shown to be equivalent to always performing the check.

    The essential behaviour is that activity occuring _after_ a watch has been
    added and _before_ it has been removed, will generate events.

    Signed-off-by: Nick Piggin
    Cc: Robert Love
    Cc: John McCutchan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

23 Mar, 2006

2 commits

  • Semaphore to mutex conversion.

    The conversion was generated via scripts, and the result was validated
    automatically via a script as well.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • Semaphore to mutex conversion.

    The conversion was generated via scripts, and the result was validated
    automatically via a script as well.

    Signed-off-by: Ingo Molnar
    Cc: John McCutchan
    Signed-off-by: Andrew Morton
    Acked-by: Robert Love
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

08 Feb, 2006

1 commit


19 Jan, 2006

1 commit


13 Dec, 2005

1 commit

  • The below patch lets userspace have more control over the inodes that
    inotify will watch. It introduces two new flags.

    IN_ONLYDIR -- only watch the inode if it is a directory.
    This is needed to avoid the race that can occur when we want to be
    sure that we are watching a directory.

    IN_DONT_FOLLOW -- don't follow a symlink. In combination
    with IN_ONLYDIR we can make sure that we don't watch the target of
    symlinks.

    The issues the flags fix came up when writing the gnome-vfs inotify
    backend. Default behaviour is unchanged.

    Signed-off-by: John McCutchan
    Acked-by: Robert Love
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    John McCutchan
     

09 Nov, 2005

1 commit

  • Most permission() calls have a struct nameidata * available. This helper
    takes that as an argument and thus makes sure we pass it down for lookup
    intents and prepares for per-mount read-only support where we need a struct
    vfsmount for checking whether a file is writeable.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

24 Oct, 2005

1 commit

  • Fix a bug which was reported and diagnosed by
    Stefan Jones

    IDR trees include a cache of idr_layer objects. There's no way to destroy
    this cache, so when we discard an overall idr tree we end up leaking some
    memory.

    Add and use idr_destroy() for this. v9fs and infiniband also need to use
    idr_destroy() to avoid leaks.

    Or, we make the cache global, like radix_tree_preload(). Which is probably
    better. Later.

    Cc: Eric Van Hensbergen
    Cc: Roland Dreier
    Cc: Robert Love
    Cc: John McCutchan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

08 Sep, 2005

2 commits

  • People have run into a problem when they do this:

    watch (file1, all_events);
    watch (file2, some_events);

    if file2 is a hard link to file1, some events will be missed because by
    default we replace the mask. The patch below adds a flag IN_MASK_ADD which
    will cause inotify to add to the existing mask if present.

    Signed-off-by: John McCutchan
    Signed-off-by: Robert Love
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    John McCutchan
     
  • Bypass an inotify-related fastpath spinlock and several function calls on
    systems which have no inotify watches registered.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    John McCutchan
     

27 Aug, 2005

1 commit

  • There is an off by one problem with idr_get_new_above.

    The comment and function name suggest that it will return an id >
    starting_id, but it actually returned an id >= starting_id, and kernel
    callers other than inotify treated it as such.

    The patch below fixes the comment, and fixes inotifys usage. The
    function name still doesn't match the behaviour, but it never did.

    Signed-off-by: John McCutchan
    Signed-off-by: Linus Torvalds

    John McCutchan
     

16 Aug, 2005

1 commit


02 Aug, 2005

1 commit

  • When you rm a watch, an IN_IGNORED event is sent down the event queue
    with the watch descriptor that you just rm'd.

    If you then add a watch you could get the ignored watch's wd and if you
    haven't read the entire event queue, user space will think that it's
    newly created watch was just ignored.

    To avoid this problem we just use idr_get_new_above instead of
    idr_get_new.

    Signed-off-by: John McCutchan
    Signed-off-by: Robert Love
    Signed-off-by: Linus Torvalds

    John McCutchan
     

27 Jul, 2005

7 commits


14 Jul, 2005

2 commits


13 Jul, 2005

1 commit

  • inotify is intended to correct the deficiencies of dnotify, particularly
    its inability to scale and its terrible user interface:

    * dnotify requires the opening of one fd per each directory
    that you intend to watch. This quickly results in too many
    open files and pins removable media, preventing unmount.
    * dnotify is directory-based. You only learn about changes to
    directories. Sure, a change to a file in a directory affects
    the directory, but you are then forced to keep a cache of
    stat structures.
    * dnotify's interface to user-space is awful. Signals?

    inotify provides a more usable, simple, powerful solution to file change
    notification:

    * inotify's interface is a system call that returns a fd, not SIGIO.
    You get a single fd, which is select()-able.
    * inotify has an event that says "the filesystem that the item
    you were watching is on was unmounted."
    * inotify can watch directories or files.

    Inotify is currently used by Beagle (a desktop search infrastructure),
    Gamin (a FAM replacement), and other projects.

    See Documentation/filesystems/inotify.txt.

    Signed-off-by: Robert Love
    Cc: John McCutchan
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert Love