05 Sep, 2015

2 commits

  • Free list is used when all marks on given inode / mount should be
    destroyed when inode / mount is going away. However we can free all of
    the marks without using a special list with some care.

    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • I have a _tiny_ microbenchmark that sits in a loop and writes single
    bytes to a file. Writing one byte to a tmpfs file is around 2x slower
    than reading one byte from a file, which is a _bit_ more than I expecte.
    This is a dumb benchmark, but I think it's hard to deny that write() is
    a hot path and we should avoid unnecessary overhead there.

    I did a 'perf record' of 30-second samples of read and write. The top
    item in a diffprofile is srcu_read_lock() from fsnotify(). There are
    active inotify fd's from systemd, but nothing is actually listening to
    the file or its part of the filesystem.

    I *think* we can avoid taking the srcu_read_lock() for the common case
    where there are no actual marks on the file. This means that there will
    both be nothing to notify for *and* implies that there is no need for
    clearing the ignore mask.

    This patch gave a 13.1% speedup in writes/second on my test, which is an
    improvement from the 10.8% that I saw with the last version.

    Signed-off-by: Dave Hansen
    Reviewed-by: Jan Kara
    Cc: Al Viro
    Cc: Eric Paris
    Cc: John McCutchan
    Cc: Robert Love
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     

14 Dec, 2014

1 commit


09 Dec, 2014

1 commit


14 Nov, 2014

1 commit

  • fsnotify() needs to merge inode and mount marks lists when notifying
    groups about events so that ignore masks from inode marks are reflected
    in mount mark notifications and groups are notified in proper order
    (according to priorities).

    Currently the sorting of the lists done by fsnotify_add_inode_mark() /
    fsnotify_add_vfsmount_mark() and fsnotify() differed which resulted
    ignore masks not being used in some cases.

    Fix the problem by always using the same comparison function when
    sorting / merging the mark lists.

    Thanks to Heinrich Schuchardt for improvements of my patch.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=87721
    Signed-off-by: Jan Kara
    Reported-by: Heinrich Schuchardt
    Tested-by: Heinrich Schuchardt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

04 Nov, 2014

1 commit


18 Feb, 2014

1 commit

  • My rework of handling of notification events (namely commit 7053aee26a35
    "fsnotify: do not share events between notification groups") broke
    sending of cookies with inotify events. We didn't propagate the value
    passed to fsnotify() properly and passed 4 uninitialized bytes to
    userspace instead (so it is also an information leak). Sadly I didn't
    notice this during my testing because inotify cookies aren't used very
    much and LTP inotify tests ignore them.

    Fix the problem by passing the cookie value properly.

    Fixes: 7053aee26a3548ebaba046ae2e52396ccf56ac6c
    Reported-by: Vegard Nossum
    Signed-off-by: Jan Kara

    Jan Kara
     

22 Jan, 2014

2 commits

  • After removing event structure creation from the generic layer there is
    no reason for separate .should_send_event and .handle_event callbacks.
    So just remove the first one.

    Signed-off-by: Jan Kara
    Reviewed-by: Christoph Hellwig
    Cc: Eric Paris
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Currently fsnotify framework creates one event structure for each
    notification event and links this event into all interested notification
    groups. This is done so that we save memory when several notification
    groups are interested in the event. However the need for event
    structure shared between inotify & fanotify bloats the event structure
    so the result is often higher memory consumption.

    Another problem is that fsnotify framework keeps path references with
    outstanding events so that fanotify can return open file descriptors
    with its events. This has the undesirable effect that filesystem cannot
    be unmounted while there are outstanding events - a regression for
    inotify compared to a situation before it was converted to fsnotify
    framework. For fanotify this problem is hard to avoid and users of
    fanotify should kind of expect this behavior when they ask for file
    descriptors from notified files.

    This patch changes fsnotify and its users to create separate event
    structure for each group. This allows for much simpler code (~400 lines
    removed by this patch) and also smaller event structures. For example
    on 64-bit system original struct fsnotify_event consumes 120 bytes, plus
    additional space for file name, additional 24 bytes for second and each
    subsequent group linking the event, and additional 32 bytes for each
    inotify group for private data. After the conversion inotify event
    consumes 48 bytes plus space for file name which is considerably less
    memory unless file names are long and there are several groups
    interested in the events (both of which are uncommon). Fanotify event
    fits in 56 bytes after the conversion (fanotify doesn't care about file
    names so its events don't have to have it allocated). A win unless
    there are four or more fanotify groups interested in the event.

    The conversion also solves the problem with unmount when only inotify is
    used as we don't have to grab path references for inotify events.

    [hughd@google.com: fanotify: fix corruption preventing startup]
    Signed-off-by: Jan Kara
    Reviewed-by: Christoph Hellwig
    Cc: Eric Paris
    Cc: Al Viro
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

28 Feb, 2013

1 commit

  • I'm not sure why, but the hlist for each entry iterators were conceived

    list_for_each_entry(pos, head, member)

    The hlist ones were greedy and wanted an extra parameter:

    hlist_for_each_entry(tpos, pos, head, member)

    Why did they need an extra pos parameter? I'm not quite sure. Not only
    they don't really need it, it also prevents the iterator from looking
    exactly like the list iterator, which is unfortunate.

    Besides the semantic patch, there was some manual work required:

    - Fix up the actual hlist iterators in linux/list.h
    - Fix up the declaration of other iterators based on the hlist ones.
    - A very small amount of places were using the 'node' parameter, this
    was modified to use 'obj->member' instead.
    - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
    properly, so those had to be fixed up manually.

    The semantic patch which is mostly the work of Peter Senna Tschudin is here:

    @@
    iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

    type T;
    expression a,c,d,e;
    identifier b;
    statement S;
    @@

    -T b;

    [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
    [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
    [akpm@linux-foundation.org: checkpatch fixes]
    [akpm@linux-foundation.org: fix warnings]
    [akpm@linux-foudnation.org: redo intrusive kvm changes]
    Tested-by: Peter Senna Tschudin
    Acked-by: Paul E. McKenney
    Signed-off-by: Sasha Levin
    Cc: Wu Fengguang
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     

14 Jul, 2012

1 commit


31 May, 2012

1 commit


04 Jan, 2012

1 commit


07 Jan, 2011

4 commits

  • dcache_inode_lock can be replaced with per-inode locking. Use existing
    inode->i_lock for this. This is slightly non-trivial because we sometimes
    need to find the inode from the dentry, which requires d_inode to be
    stabilised (either with refcount or d_lock).

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • dcache_lock no longer protects anything. remove it.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Add a new lock, dcache_inode_lock, to protect the inode's i_dentry list
    from concurrent modification. d_alias is also protected by d_lock.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Protect d_subdirs and d_child with d_lock, except in filesystems that aren't
    using dcache_lock for these anyway (eg. using i_mutex).

    Note: if we change the locking rule in future so that ->d_child protection is
    provided only with ->d_parent->d_lock, it may allow us to reduce some locking.
    But it would be an exception to an otherwise regular locking scheme, so we'd
    have to see some good results. Probably not worthwhile.

    Signed-off-by: Nick Piggin

    Nick Piggin
     

29 Oct, 2010

2 commits


26 Oct, 2010

1 commit

  • Use dget_parent instead of opencoding it. This simplifies the code, but
    more importanly prepares for the more complicated locking for a parent
    dget in the dcache scale patch series.

    It means we do grab a reference to the parent now if need to be watched,
    but not with the specified mask. If this turns out to be a problem
    we'll have to revisit it, but for now let's keep as much as possible
    dcache internals inside dcache.[ch].

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

28 Aug, 2010

2 commits

  • The fsnotify main loop has 2 bools which indicated if we processed the
    inode or vfsmount mark in that particular pass through the loop. These
    bool can we replaced with the inode_group and vfsmount_group variables
    and actually make the code a little easier to understand.

    Signed-off-by: Eric Paris

    Eric Paris
     
  • Marks were stored on the inode and vfsmonut mark list in order from
    highest memory address to lowest memory address. The code to walk those
    lists thought they were in order from lowest to highest with
    unpredictable results when trying to match up marks from each. It was
    possible that extra events would be sent to userspace when inode
    marks ignoring events wouldn't get matched with the vfsmount marks.

    This problem only affected fanotify when using both vfsmount and inode
    marks simultaneously.

    Signed-off-by: Eric Paris

    Eric Paris
     

23 Aug, 2010

3 commits

  • The interesting 2 list lockstep walking didn't quite work out if the inode
    marks only had ignores and the vfsmount list requested events. The code to
    shortcut list traversal would not run the inode list since it didn't have real
    event requests. This code forces inode list traversal when a vfsmount mark
    matches the event type. Maybe we could add an i_fsnotify_ignored_mask field
    to struct inode to get the shortcut back, but it doesn't seem worth it to grow
    struct inode again.

    I bet with the recent changes to lock the way we do now it would actually not
    be a major perf hit to just drop i_fsnotify_mark_mask altogether. But that is
    for another day.

    Signed-off-by: Eric Paris

    Eric Paris
     
  • The fsnotify main loop has 2 booleans which tell if a particular mark was
    sent to the listeners or if it should be processed in the next pass. The
    problem is that the booleans were not reset on each traversal of the loop.
    So marks could get skipped even when they were not sent to the notifiers.

    Reported-by: Tvrtko Ursulin
    Signed-off-by: Eric Paris

    Eric Paris
     
  • The fanotify code is supposed to get the group from the mark. It accidentally
    only used the inode_mark. If the vfsmount_mark was set but not the inode_mark
    it would deref the NULL inode_mark. Get the group from the correct place.

    Reported-by: Tvrtko Ursulin
    Signed-off-by: Eric Paris

    Eric Paris
     

13 Aug, 2010

1 commit

  • This reverts commit 3bcf3860a4ff9bbc522820b4b765e65e4deceb3e (and the
    accompanying commit c1e5c954020e "vfs/fsnotify: fsnotify_close can delay
    the final work in fput" that was a horribly ugly hack to make it work at
    all).

    The 'struct file' approach not only causes that disgusting hack, it
    somehow breaks pulseaudio, probably due to some other subtlety with
    f_count handling.

    Fix up various conflicts due to later fsnotify work.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

28 Jul, 2010

14 commits