05 Sep, 2015
2 commits
-
Free list is used when all marks on given inode / mount should be
destroyed when inode / mount is going away. However we can free all of
the marks without using a special list with some care.Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
I have a _tiny_ microbenchmark that sits in a loop and writes single
bytes to a file. Writing one byte to a tmpfs file is around 2x slower
than reading one byte from a file, which is a _bit_ more than I expecte.
This is a dumb benchmark, but I think it's hard to deny that write() is
a hot path and we should avoid unnecessary overhead there.I did a 'perf record' of 30-second samples of read and write. The top
item in a diffprofile is srcu_read_lock() from fsnotify(). There are
active inotify fd's from systemd, but nothing is actually listening to
the file or its part of the filesystem.I *think* we can avoid taking the srcu_read_lock() for the common case
where there are no actual marks on the file. This means that there will
both be nothing to notify for *and* implies that there is no need for
clearing the ignore mask.This patch gave a 13.1% speedup in writes/second on my test, which is an
improvement from the 10.8% that I saw with the last version.Signed-off-by: Dave Hansen
Reviewed-by: Jan Kara
Cc: Al Viro
Cc: Eric Paris
Cc: John McCutchan
Cc: Robert Love
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
14 Dec, 2014
1 commit
-
There's a lot of common code in inode and mount marks handling. Factor it
out to a common helper function.Signed-off-by: Jan Kara
Cc: Eric Paris
Cc: Heinrich Schuchardt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 Dec, 2014
1 commit
14 Nov, 2014
1 commit
-
fsnotify() needs to merge inode and mount marks lists when notifying
groups about events so that ignore masks from inode marks are reflected
in mount mark notifications and groups are notified in proper order
(according to priorities).Currently the sorting of the lists done by fsnotify_add_inode_mark() /
fsnotify_add_vfsmount_mark() and fsnotify() differed which resulted
ignore masks not being used in some cases.Fix the problem by always using the same comparison function when
sorting / merging the mark lists.Thanks to Heinrich Schuchardt for improvements of my patch.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=87721
Signed-off-by: Jan Kara
Reported-by: Heinrich Schuchardt
Tested-by: Heinrich Schuchardt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 Nov, 2014
1 commit
-
Signed-off-by: Al Viro
18 Feb, 2014
1 commit
-
My rework of handling of notification events (namely commit 7053aee26a35
"fsnotify: do not share events between notification groups") broke
sending of cookies with inotify events. We didn't propagate the value
passed to fsnotify() properly and passed 4 uninitialized bytes to
userspace instead (so it is also an information leak). Sadly I didn't
notice this during my testing because inotify cookies aren't used very
much and LTP inotify tests ignore them.Fix the problem by passing the cookie value properly.
Fixes: 7053aee26a3548ebaba046ae2e52396ccf56ac6c
Reported-by: Vegard Nossum
Signed-off-by: Jan Kara
22 Jan, 2014
2 commits
-
After removing event structure creation from the generic layer there is
no reason for separate .should_send_event and .handle_event callbacks.
So just remove the first one.Signed-off-by: Jan Kara
Reviewed-by: Christoph Hellwig
Cc: Eric Paris
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently fsnotify framework creates one event structure for each
notification event and links this event into all interested notification
groups. This is done so that we save memory when several notification
groups are interested in the event. However the need for event
structure shared between inotify & fanotify bloats the event structure
so the result is often higher memory consumption.Another problem is that fsnotify framework keeps path references with
outstanding events so that fanotify can return open file descriptors
with its events. This has the undesirable effect that filesystem cannot
be unmounted while there are outstanding events - a regression for
inotify compared to a situation before it was converted to fsnotify
framework. For fanotify this problem is hard to avoid and users of
fanotify should kind of expect this behavior when they ask for file
descriptors from notified files.This patch changes fsnotify and its users to create separate event
structure for each group. This allows for much simpler code (~400 lines
removed by this patch) and also smaller event structures. For example
on 64-bit system original struct fsnotify_event consumes 120 bytes, plus
additional space for file name, additional 24 bytes for second and each
subsequent group linking the event, and additional 32 bytes for each
inotify group for private data. After the conversion inotify event
consumes 48 bytes plus space for file name which is considerably less
memory unless file names are long and there are several groups
interested in the events (both of which are uncommon). Fanotify event
fits in 56 bytes after the conversion (fanotify doesn't care about file
names so its events don't have to have it allocated). A win unless
there are four or more fanotify groups interested in the event.The conversion also solves the problem with unmount when only inotify is
used as we don't have to grab path references for inotify events.[hughd@google.com: fanotify: fix corruption preventing startup]
Signed-off-by: Jan Kara
Reviewed-by: Christoph Hellwig
Cc: Eric Paris
Cc: Al Viro
Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
28 Feb, 2013
1 commit
-
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;type T;
expression a,c,d,e;
identifier b;
statement S;
@@-T b;
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin
Acked-by: Paul E. McKenney
Signed-off-by: Sasha Levin
Cc: Wu Fengguang
Cc: Marcelo Tosatti
Cc: Gleb Natapov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
14 Jul, 2012
1 commit
-
Signed-off-by: Al Viro
31 May, 2012
1 commit
-
We don't use "mnt" anymore in send_to_group() after 1968f5eed5 ("fanotify:
use both marks when possible") was applied.Signed-off-by: Dan Carpenter
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro
04 Jan, 2012
1 commit
-
Signed-off-by: Al Viro
07 Jan, 2011
4 commits
-
dcache_inode_lock can be replaced with per-inode locking. Use existing
inode->i_lock for this. This is slightly non-trivial because we sometimes
need to find the inode from the dentry, which requires d_inode to be
stabilised (either with refcount or d_lock).Signed-off-by: Nick Piggin
-
dcache_lock no longer protects anything. remove it.
Signed-off-by: Nick Piggin
-
Add a new lock, dcache_inode_lock, to protect the inode's i_dentry list
from concurrent modification. d_alias is also protected by d_lock.Signed-off-by: Nick Piggin
-
Protect d_subdirs and d_child with d_lock, except in filesystems that aren't
using dcache_lock for these anyway (eg. using i_mutex).Note: if we change the locking rule in future so that ->d_child protection is
provided only with ->d_parent->d_lock, it may allow us to reduce some locking.
But it would be an exception to an otherwise regular locking scheme, so we'd
have to see some good results. Probably not worthwhile.Signed-off-by: Nick Piggin
29 Oct, 2010
2 commits
-
fsnotify perm events do not call fsnotify parent. That means you cannot
register a perm event on a directory and enforce permissions on all inodes in
that directory. This patch fixes that situation.Signed-off-by: Eric Paris
-
When fsnotify groups return errors they are ignored. For permissions
events these should be passed back up the stack, but for most events these
should continue to be ignored.Signed-off-by: Eric Paris
26 Oct, 2010
1 commit
-
Use dget_parent instead of opencoding it. This simplifies the code, but
more importanly prepares for the more complicated locking for a parent
dget in the dcache scale patch series.It means we do grab a reference to the parent now if need to be watched,
but not with the specified mask. If this turns out to be a problem
we'll have to revisit it, but for now let's keep as much as possible
dcache internals inside dcache.[ch].Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro
28 Aug, 2010
2 commits
-
The fsnotify main loop has 2 bools which indicated if we processed the
inode or vfsmount mark in that particular pass through the loop. These
bool can we replaced with the inode_group and vfsmount_group variables
and actually make the code a little easier to understand.Signed-off-by: Eric Paris
-
Marks were stored on the inode and vfsmonut mark list in order from
highest memory address to lowest memory address. The code to walk those
lists thought they were in order from lowest to highest with
unpredictable results when trying to match up marks from each. It was
possible that extra events would be sent to userspace when inode
marks ignoring events wouldn't get matched with the vfsmount marks.This problem only affected fanotify when using both vfsmount and inode
marks simultaneously.Signed-off-by: Eric Paris
23 Aug, 2010
3 commits
-
The interesting 2 list lockstep walking didn't quite work out if the inode
marks only had ignores and the vfsmount list requested events. The code to
shortcut list traversal would not run the inode list since it didn't have real
event requests. This code forces inode list traversal when a vfsmount mark
matches the event type. Maybe we could add an i_fsnotify_ignored_mask field
to struct inode to get the shortcut back, but it doesn't seem worth it to grow
struct inode again.I bet with the recent changes to lock the way we do now it would actually not
be a major perf hit to just drop i_fsnotify_mark_mask altogether. But that is
for another day.Signed-off-by: Eric Paris
-
The fsnotify main loop has 2 booleans which tell if a particular mark was
sent to the listeners or if it should be processed in the next pass. The
problem is that the booleans were not reset on each traversal of the loop.
So marks could get skipped even when they were not sent to the notifiers.Reported-by: Tvrtko Ursulin
Signed-off-by: Eric Paris -
The fanotify code is supposed to get the group from the mark. It accidentally
only used the inode_mark. If the vfsmount_mark was set but not the inode_mark
it would deref the NULL inode_mark. Get the group from the correct place.Reported-by: Tvrtko Ursulin
Signed-off-by: Eric Paris
13 Aug, 2010
1 commit
-
This reverts commit 3bcf3860a4ff9bbc522820b4b765e65e4deceb3e (and the
accompanying commit c1e5c954020e "vfs/fsnotify: fsnotify_close can delay
the final work in fput" that was a horribly ugly hack to make it work at
all).The 'struct file' approach not only causes that disgusting hack, it
somehow breaks pulseaudio, probably due to some other subtlety with
f_count handling.Fix up various conflicts due to later fsnotify work.
Signed-off-by: Linus Torvalds
28 Jul, 2010
14 commits
-
fanotify currently, when given a vfsmount_mark will look up (if it exists)
the corresponding inode mark. This patch drops that lookup and uses the
mark provided.Signed-off-by: Eric Paris
-
should_send_event() and handle_event() will both need to look up the inode
event if they get a vfsmount event. Lets just pass both at the same time
since we have them both after walking the lists in lockstep.Signed-off-by: Eric Paris
-
We currently walk the list of marks on an inode followed by the list of
marks on the vfsmount. These are in order (by the memory address of the
group) so lets walk them both together. Eventually we can pass both the
inode mark and the vfsmount mark to helpers simultaneously.Signed-off-by: Eric Paris
-
currently ignored_mark clearing is done in a seperate list traversal
before the actual list traversal to send events. There is no need for
this. Do them at the same time.Signed-off-by: Eric Paris
-
The global fsnotify groups lists were invented as a way to increase the
performance of fsnotify by shortcutting events which were not interesting.
With the changes to walk the object lists rather than global groups lists
these shortcuts are not useful.Signed-off-by: Eric Paris
-
Because we walk the object->fsnotify_marks list instead of the global
fsnotify groups list we don't need the fsnotify_inode_mask and
fsnotify_vfsmount_mask as these were simply shortcuts in fsnotify() for
performance. They are now extra checks, rip them out.Signed-off-by: Eric Paris
-
The change to use srcu and walk the object list rather than the global
fsnotify_group list means that should_send_event is no longer needed for a
number of groups and can be simplified for others. Do that.Signed-off-by: Eric Paris
-
With the change of fsnotify to use srcu walking the marks list instead of
walking the global groups list we now know the mark in question. The code can
send the mark to the group's handling functions and the groups won't have to
find those marks themselves.Signed-off-by: Eric Paris
-
Currently reading the inode->i_fsnotify_marks or
vfsmount->mnt_fsnotify_marks lists are protected by a spinlock on both the
read and the write side. This patch protects the read side of those lists
with a new single srcu.Signed-off-by: Eric Paris
-
Al explains that calling dentry_open() with a mnt/dentry pair is only
garunteed to be safe if they are already used in an open struct file. To
make sure this is the case don't store and use a struct path in fsnotify,
always use a struct file.Signed-off-by: Eric Paris
-
It can be hard to debug fsnotify since there are so few printks. Use
pr_debug to allow for dynamic debugging.Signed-off-by: Eric Paris
-
This patch adds a check to make sure that all fsnotify bits are unique and we
cannot accidentally use the same bit for 2 different fsnotify event types.Signed-off-by: Eric Paris
-
It seems to me you are always returning 0 in fsnotify, when you should return
the error (EPERM) returned by fanotify.Signed-off-by: Jean-Christophe DUBOIS
Signed-off-by: Eric Paris -
introduce a new fsnotify hook, fsnotify_perm(), which is called from the
security code. This hook is used to allow fsnotify groups to make access
control decisions about events on the system. We also must change the
generic fsnotify function to return an error code if we intend these hooks
to be in any way useful.Signed-off-by: Eric Paris