Doug / smarc-fsl-linux-kernel | Embedian Git Server

30 Apr, 2013

1 commit

a66c04b45 inotify: convert inotify_add_to_idr() to use idr_alloc_cyclic() ... Browse Code »

Signed-off-by: Jeff Layton
Cc: John McCutchan
Cc: Robert Love
Cc: Eric Paris
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeff Layton
2013-04-30 09:28:41 +0800

12 Dec, 2012

8 commits

0a6b6bd59 fsnotify: make fasync generic for both inotify and fanotify ... Browse Code »

inotify is supposed to support async signal notification when information
is available on the inotify fd. This patch moves that support to generic
fsnotify functions so it can be used by all notification mechanisms.

Signed-off-by: Eric Paris

Eric Paris
2012-12-12 02:44:36 +0800
6960b0d90 fsnotify: change locking order ... Browse Code »

On Mon, Aug 01, 2011 at 04:38:22PM -0400, Eric Paris wrote:
>
> I finally built and tested a v3.0 kernel with these patches (I know I'm
> SOOOOOO far behind). Not what I hoped for:
>
> > [ 150.937798] VFS: Busy inodes after unmount of tmpfs. Self-destruct in 5 seconds. Have a nice day...
> > [ 150.945290] BUG: unable to handle kernel NULL pointer dereference at 0000000000000070
> > [ 150.946012] IP: [] shmem_free_inode+0x18/0x50
> > [ 150.946012] PGD 2bf9e067 PUD 2bf9f067 PMD 0
> > [ 150.946012] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > [ 150.946012] CPU 0
> > [ 150.946012] Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ext4 jbd2 crc16 joydev ata_piix i2c_piix4 pcspkr uinput ipv6 autofs4 usbhid [last unloaded: scsi_wait_scan]
> > [ 150.946012]
> > [ 150.946012] Pid: 2764, comm: syscall_thrash Not tainted 3.0.0+ #1 Red Hat KVM
> > [ 150.946012] RIP: 0010:[] [] shmem_free_inode+0x18/0x50
> > [ 150.946012] RSP: 0018:ffff88002c2e5df8 EFLAGS: 00010282
> > [ 150.946012] RAX: 000000004e370d9f RBX: 0000000000000000 RCX: ffff88003a029438
> > [ 150.946012] RDX: 0000000033630a5f RSI: 0000000000000000 RDI: ffff88003491c240
> > [ 150.946012] RBP: ffff88002c2e5e08 R08: 0000000000000000 R09: 0000000000000000
> > [ 150.946012] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003a029428
> > [ 150.946012] R13: ffff88003a029428 R14: ffff88003a029428 R15: ffff88003499a610
> > [ 150.946012] FS: 00007f5a05420700(0000) GS:ffff88003f600000(0000) knlGS:0000000000000000
> > [ 150.946012] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > [ 150.946012] CR2: 0000000000000070 CR3: 000000002a662000 CR4: 00000000000006f0
> > [ 150.946012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 150.946012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > [ 150.946012] Process syscall_thrash (pid: 2764, threadinfo ffff88002c2e4000, task ffff88002bfbc760)
> > [ 150.946012] Stack:
> > [ 150.946012] ffff88003a029438 ffff88003a029428 ffff88002c2e5e38 ffffffff81102f76
> > [ 150.946012] ffff88003a029438 ffff88003a029598 ffffffff8160f9c0 ffff88002c221250
> > [ 150.946012] ffff88002c2e5e68 ffffffff8115e9be ffff88002c2e5e68 ffff88003a029438
> > [ 150.946012] Call Trace:
> > [ 150.946012] [] shmem_evict_inode+0x76/0x130
> > [ 150.946012] [] evict+0x7e/0x170
> > [ 150.946012] [] iput_final+0xd0/0x190
> > [ 150.946012] [] iput+0x33/0x40
> > [ 150.946012] [] fsnotify_destroy_mark_locked+0x145/0x160
> > [ 150.946012] [] fsnotify_destroy_mark+0x36/0x50
> > [ 150.946012] [] sys_inotify_rm_watch+0x77/0xd0
> > [ 150.946012] [] system_call_fastpath+0x16/0x1b
> > [ 150.946012] Code: 67 4a 00 b8 e4 ff ff ff eb aa 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 10 48 89 1c 24 4c 89 64 24 08 48 8b 9f 40 05 00 00
> > [ 150.946012] 83 7b 70 00 74 1c 4c 8d a3 80 00 00 00 4c 89 e7 e8 d2 5d 4a
> > [ 150.946012] RIP [] shmem_free_inode+0x18/0x50
> > [ 150.946012] RSP
> > [ 150.946012] CR2: 0000000000000070
>
> Looks at aweful lot like the problem from:
> http://www.spinics.net/lists/linux-fsdevel/msg46101.html
>

I tried to reproduce this bug with your test program, but without success.
However, if I understand correctly, this occurs since we dont hold any locks when
we call iput() in mark_destroy(), right?
With the patches you tested, iput() is also not called within any lock, since the
groups mark_mutex is released temporarily before iput() is called. This is, since
the original codes behaviour is similar.
However since we now have a mutex as the biggest lock, we can do what you
suggested (http://www.spinics.net/lists/linux-fsdevel/msg46107.html) and
call iput() with the mutex held to avoid the race.
The patch below implements this. It uses nested locking to avoid deadlock in case
we do the final iput() on an inode which still holds marks and thus would take
the mutex again when calling fsnotify_inode_delete() in destroy_inode().

Signed-off-by: Lino Sanfilippo
Signed-off-by: Eric Paris

Lino Sanfilippo
2012-12-12 02:44:36 +0800
64c20d2a2 fsnotify: dont put marks on temporary list when clearing marks by group ... Browse Code »

In clear_marks_by_group_flags() the mark list of a group is iterated and the
marks are put on a temporary list.
Since we introduced fsnotify_destroy_mark_locked() we dont need the temp list
any more and are able to remove the marks while the mark list is iterated and
the mark list mutex is held.

Signed-off-by: Lino Sanfilippo
Signed-off-by: Eric Paris

Lino Sanfilippo
2012-12-12 02:44:36 +0800
d5a335b84 fsnotify: introduce locked versions of fsnotify_add_mark() and fsnotify_remove_mark() ... Browse Code »

This patch introduces fsnotify_add_mark_locked() and fsnotify_remove_mark_locked()
which are essentially the same as fsnotify_add_mark() and fsnotify_remove_mark() but
assume that the caller has already taken the groups mark mutex.

Signed-off-by: Lino Sanfilippo
Signed-off-by: Eric Paris

Lino Sanfilippo
2012-12-12 02:44:36 +0800
e2a29943e fsnotify: pass group to fsnotify_destroy_mark() ... Browse Code »

In fsnotify_destroy_mark() dont get the group from the passed mark anymore,
but pass the group itself as an additional parameter to the function.

Signed-off-by: Lino Sanfilippo
Signed-off-by: Eric Paris

Lino Sanfilippo
2012-12-12 02:44:36 +0800
986ab0980 fsnotify: use a mutex instead of a spinlock to protect a groups mark list ... Browse Code »

Replaces the groups mark_lock spinlock with a mutex. Using a mutex instead
of a spinlock results in more flexibility (i.e it allows to sleep while the
lock is held).

Signed-off-by: Lino Sanfilippo
Signed-off-by: Eric Paris

Lino Sanfilippo
2012-12-12 02:29:46 +0800
986129520 fsnotify: introduce fsnotify_get_group() ... Browse Code »

Introduce fsnotify_get_group() which increments the reference counter of a group.

Signed-off-by: Lino Sanfilippo
Signed-off-by: Eric Paris

Lino Sanfilippo
2012-12-12 02:29:44 +0800
d8153d4d8 inotify, fanotify: replace fsnotify_put_group() with fsnotify_destroy_group() ... Browse Code »

Currently in fsnotify_put_group() the ref count of a group is decremented and if
it becomes 0 fsnotify_destroy_group() is called. Since a groups ref count is only
at group creation set to 1 and never increased after that a call to fsnotify_put_group()
always results in a call to fsnotify_destroy_group().
With this patch fsnotify_destroy_group() is called directly.

Signed-off-by: Lino Sanfilippo
Signed-off-by: Eric Paris

Lino Sanfilippo
2012-12-12 02:29:43 +0800

31 May, 2012

1 commit

a4f9a9a63 fsnotify: handle subfiles' perm events ... Browse Code »

Recently I'm working on fanotify and found the following strange
behaviors.

I wrote a program to set fanotify_mark on "/tmp/block" and FAN_DENY
all events notified.

fanotify_mask = FAN_ALL_EVENTS | FAN_ALL_PERM_EVENTS | FAN_EVENT_ON_CHILD:
$ cd /tmp/block; cat foo
cat: foo: Operation not permitted

Operation on the file is blocked as expected.

But,

fanotify_mask = FAN_ALL_PERM_EVENTS | FAN_EVENT_ON_CHILD:
$ cd /tmp/block; cat foo
aaa

It's not blocked anymore. This is confusing behavior. Also reading
commit "fsnotify: call fsnotify_parent in perm events", it seems like
fsnotify should handle subfiles' perm events as well as the other notify
events.

With this patch, regardless of FAN_ALL_EVENTS set or not:
$ cd /tmp/block; cat foo
cat: foo: Operation not permitted

Operation on the file is now blocked properly.

FS_OPEN_PERM and FS_ACCESS_PERM are not listed on FS_EVENTS_POSS_ON_CHILD.
Due to fsnotify_inode_watches_children() check, if you only specify only
these events as fsnotify_mask, you don't get subfiles' perm events
notified.

This patch add the events to FS_EVENTS_POSS_ON_CHILD to get them notified
even if only these events are specified to fsnotify_mask.

Signed-off-by: Naohiro Aota
Cc: Eric Paris
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Naohiro Aota
2012-05-31 09:04:53 +0800

27 Jul, 2011

1 commit

60063497a atomic: use <linux/atomic.h> ... Browse Code »

This allows us to move duplicated code in
(atomic_inc_not_zero() for now) to

Signed-off-by: Arun Sharma
Reviewed-by: Eric Dumazet
Cc: Ingo Molnar
Cc: David Miller
Cc: Eric Dumazet
Acked-by: Mike Frysinger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arun Sharma
2011-07-27 07:49:47 +0800

07 Jan, 2011

1 commit

b5c84bf6f fs: dcache remove dcache_lock ... Browse Code »

dcache_lock no longer protects anything. remove it.

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:23 +0800

08 Dec, 2010

1 commit

09e5f14e5 fanotify: on group destroy allow all waiters to bypass permission check ... Browse Code »

When fanotify_release() is called, there may still be processes waiting for
access permission. Currently only processes for which an event has already been
queued into the groups access list will be woken up. Processes for which no
event has been queued will continue to sleep and thus cause a deadlock when
fsnotify_put_group() is called.
Furthermore there is a race allowing further processes to be waiting on the
access wait queue after wake_up (if they arrive before clear_marks_by_group()
is called).
This patch corrects this by setting a flag to inform processes that the group
is about to be destroyed and thus not to wait for access permission.

[additional changelog from eparis]
Lets think about the 4 relevant code paths from the PoV of the
'operator' 'listener' 'responder' and 'closer'. Where operator is the
process doing an action (like open/read) which could require permission.
Listener is the task (or in this case thread) slated with reading from
the fanotify file descriptor. The 'responder' is the thread responsible
for responding to access requests. 'Closer' is the thread attempting to
close the fanotify file descriptor.

The 'operator' is going to end up in:
fanotify_handle_event()
get_response_from_access()
(THIS BLOCKS WAITING ON USERSPACE)

The 'listener' interesting code path
fanotify_read()
copy_event_to_user()
prepare_for_access_response()
(THIS CREATES AN fanotify_response_event)

The 'responder' code path:
fanotify_write()
process_access_response()
(REMOVE A fanotify_response_event, SET RESPONSE, WAKE UP 'operator')

The 'closer':
fanotify_release()
(SUPPOSED TO CLEAN UP THE REST OF THIS MESS)

What we have today is that in the closer we remove all of the
fanotify_response_events and set a bit so no more response events are
ever created in prepare_for_access_response().

The bug is that we never wake all of the operators up and tell them to
move along. You fix that in fanotify_get_response_from_access(). You
also fix other operators which haven't gotten there yet. So I agree
that's a good fix.
[/additional changelog from eparis]

[remove additional changes to minimize patch size]
[move initialization so it was inside CONFIG_FANOTIFY_PERMISSION]

Signed-off-by: Lino Sanfilippo
Signed-off-by: Eric Paris

Lino Sanfilippo
2010-12-08 05:14:22 +0800

29 Oct, 2010

7 commits

d8c0fca68 fsnotify: remove alignment padding from fsnotify_mark on 64 bit builds ... Browse Code »

Reorder struct fsnotfiy_mark to remove 8 bytes of alignment padding on 64
bit builds. Shrinks fsnotfiy_mark to 128 bytes allowing more objects per
slab in its kmem_cache and reduces the number of cachelines needed for
each structure.

Signed-off-by: Richard Kennedy
Cc: Eric Paris
Signed-off-by: Andrew Morton
Signed-off-by: Eric Paris

Richard Kennedy
2010-10-29 05:22:16 +0800
b29866aab fsnotify: rename FS_IN_ISDIR to FS_ISDIR ... Browse Code »

The _IN_ in the naming is reserved for flags only used by inotify. Since I
am about to use this flag for fanotify rename it to be generic like the
rest.

Signed-off-by: Eric Paris

Eric Paris
2010-10-29 05:22:15 +0800
4afeff850 fanotify: limit number of listeners per user ... Browse Code »

fanotify currently has no limit on the number of listeners a given user can
have open. This patch limits the total number of listeners per user to
128. This is the same as the inotify default limit.

Signed-off-by: Eric Paris

Eric Paris
2010-10-29 05:22:15 +0800
e7099d8a5 fanotify: limit the number of marks in a single fanotify group ... Browse Code »

There is currently no limit on the number of marks a given fanotify group
can have. Since fanotify is gated on CAP_SYS_ADMIN this was not seen as
a serious DoS threat. This patch implements a default of 8192, the same as
inotify to work towards removing the CAP_SYS_ADMIN gating and eliminating
the default DoS'able status.

Signed-off-by: Eric Paris

Eric Paris
2010-10-29 05:22:14 +0800
52420392c fsnotify: call fsnotify_parent in perm events ... Browse Code »

fsnotify perm events do not call fsnotify parent. That means you cannot
register a perm event on a directory and enforce permissions on all inodes in
that directory. This patch fixes that situation.

Signed-off-by: Eric Paris

Eric Paris
2010-10-29 05:22:13 +0800
ff8bcbd03 fsnotify: correctly handle return codes from listeners ... Browse Code »

When fsnotify groups return errors they are ignored. For permissions
events these should be passed back up the stack, but for most events these
should continue to be ignored.

Signed-off-by: Eric Paris

Eric Paris
2010-10-29 05:22:13 +0800
6ad2d4e3e fsnotify: implement ordering between notifiers ... Browse Code »

fanotify needs to be able to specify that some groups get events before
others. They use this idea to make sure that a hierarchical storage
manager gets access to files before programs which actually use them. This
is purely infrastructure. Everything will have a priority of 0, but the
infrastructure will exist for it to be non-zero.

Signed-off-by: Eric Paris

Eric Paris
2010-10-29 05:22:13 +0800

23 Aug, 2010

1 commit

2eebf582c fanotify: flush outstanding perm requests on group destroy ... Browse Code »

When an fanotify listener is closing it may cause a deadlock between the
listener and the original task doing an fs operation. If the original task
is waiting for a permissions response it will be holding the srcu lock. The
listener cannot clean up and exit until after that srcu lock is syncronized.
Thus deadlock. The fix introduced here is to stop accepting new permissions
events when a listener is shutting down and to grant permission for all
outstanding events. Thus the original task will eventually release the srcu
lock and the listener can complete shutdown.

Reported-by: Andreas Gruenbacher
Cc: Andreas Gruenbacher
Signed-off-by: Eric Paris

Eric Paris
2010-08-23 08:28:16 +0800

13 Aug, 2010

1 commit

2069601b3 Revert "fsnotify: store struct file not struct path" ... Browse Code »

This reverts commit 3bcf3860a4ff9bbc522820b4b765e65e4deceb3e (and the
accompanying commit c1e5c954020e "vfs/fsnotify: fsnotify_close can delay
the final work in fput" that was a horribly ugly hack to make it work at
all).

The 'struct file' approach not only causes that disgusting hack, it
somehow breaks pulseaudio, probably due to some other subtlety with
f_count handling.

Fix up various conflicts due to later fsnotify work.

Signed-off-by: Linus Torvalds

Linus Torvalds
2010-08-13 05:23:04 +0800

28 Jul, 2010

18 commits

1968f5eed fanotify: use both marks when possible ... Browse Code »

fanotify currently, when given a vfsmount_mark will look up (if it exists)
the corresponding inode mark. This patch drops that lookup and uses the
mark provided.

Signed-off-by: Eric Paris

Eric Paris
2010-07-28 22:18:55 +0800
ce8f76fb7 fsnotify: pass both the vfsmount mark and inode mark ... Browse Code »

should_send_event() and handle_event() will both need to look up the inode
event if they get a vfsmount event. Lets just pass both at the same time
since we have them both after walking the lists in lockstep.

Signed-off-by: Eric Paris

Eric Paris
2010-07-28 22:18:54 +0800
02436668d fsnotify: remove global fsnotify groups lists ... Browse Code »

The global fsnotify groups lists were invented as a way to increase the
performance of fsnotify by shortcutting events which were not interesting.
With the changes to walk the object lists rather than global groups lists
these shortcuts are not useful.

Signed-off-by: Eric Paris

Eric Paris
2010-07-28 22:18:54 +0800
43709a288 fsnotify: remove group->mask ... Browse Code »

group->mask is now useless. It was originally a shortcut for fsnotify to
save on performance. These checks are now redundant, so we remove them.

Signed-off-by: Eric Paris

Eric Paris
2010-07-28 22:18:54 +0800
03930979a fsnotify: remove the global masks ... Browse Code »

Because we walk the object->fsnotify_marks list instead of the global
fsnotify groups list we don't need the fsnotify_inode_mask and
fsnotify_vfsmount_mask as these were simply shortcuts in fsnotify() for
performance. They are now extra checks, rip them out.

Signed-off-by: Eric Paris

Eric Paris
2010-07-28 22:18:54 +0800
3a9b16b40 fsnotify: send fsnotify_mark to groups in event handling functions ... Browse Code »

With the change of fsnotify to use srcu walking the marks list instead of
walking the global groups list we now know the mark in question. The code can
send the mark to the group's handling functions and the groups won't have to
find those marks themselves.

Signed-off-by: Eric Paris

Eric Paris
2010-07-28 22:18:52 +0800
75c1be487 fsnotify: srcu to protect read side of inode and vfsmount locks ... Browse Code »

Currently reading the inode->i_fsnotify_marks or
vfsmount->mnt_fsnotify_marks lists are protected by a spinlock on both the
read and the write side. This patch protects the read side of those lists
with a new single srcu.

Signed-off-by: Eric Paris

Eric Paris
2010-07-28 22:18:52 +0800
700307a29 fsnotify: use an explicit flag to indicate fsnotify_destroy_mark has been called ... Browse Code »

Currently fsnotify check is mark->group is NULL to decide if
fsnotify_destroy_mark() has already been called or not. With the upcoming
rcu work it is a heck of a lot easier to use an explicit flag than worry
about group being set to NULL.

Signed-off-by: Eric Paris

Eric Paris
2010-07-28 22:18:52 +0800
3bcf3860a fsnotify: store struct file not struct path ... Browse Code »

Al explains that calling dentry_open() with a mnt/dentry pair is only
garunteed to be safe if they are already used in an open struct file. To
make sure this is the case don't store and use a struct path in fsnotify,
always use a struct file.

Signed-off-by: Eric Paris

Eric Paris
2010-07-28 22:18:51 +0800
f70ab54cc fsnotify: fsnotify_add_notify_event should return an event ... Browse Code »

Rather than the horrific void ** argument and such just to pass the
fanotify_merge event back to the caller of fsnotify_add_notify_event() have
those things return an event if it was different than the event suggusted to
be added.

Signed-off-by: Eric Paris

Eric Paris
2010-07-28 22:18:50 +0800
80af25886 fanotify: groups can specify their f_flags for new fd ... Browse Code »

Currently fanotify fds opened for thier listeners are done with f_flags
equal to O_RDONLY | O_LARGEFILE. This patch instead takes f_flags from the
fanotify_init syscall and uses those when opening files in the context of
the listener.

Signed-off-by: Eric Paris

Eric Paris
2010-07-28 22:18:50 +0800
20dee624c fsnotify: check to make sure all fsnotify bits are unique ... Browse Code »

This patch adds a check to make sure that all fsnotify bits are unique and we
cannot accidentally use the same bit for 2 different fsnotify event types.

Signed-off-by: Eric Paris

Eric Paris
2010-07-28 22:18:50 +0800
8c1934c8d inotify: allow users to request not to recieve events on unlinked children ... Browse Code »

An inotify watch on a directory will send events for children even if those
children have been unlinked. This patch add a new inotify flag IN_EXCL_UNLINK
which allows a watch to specificy they don't care about unlinked children.
This should fix performance problems seen by tasks which add a watch to
/tmp and then are overrun with events when other processes are reading and
writing to unlinked files they created in /tmp.

https://bugzilla.kernel.org/show_bug.cgi?id=16296

Requested-by: Matthias Clasen
Signed-off-by: Eric Paris

Eric Paris
2010-07-28 22:18:49 +0800
08ae89380 fanotify: drop the useless priority argument ... Browse Code »

The priority argument in fanotify is useless. Kill it.

Signed-off-by: Eric Paris

Eric Paris
2010-07-28 21:59:03 +0800
9e66e4233 fanotify: permissions and blocking ... Browse Code »

This is the backend work needed for fanotify to support the new
FS_OPEN_PERM and FS_ACCESS_PERM fsnotify events. This is done using the
new fsnotify secondary queue. No userspace interface is provided actually
respond to or request these events.

Signed-off-by: Eric Paris

Eric Paris
2010-07-28 21:59:02 +0800
c4ec54b40 fsnotify: new fsnotify hooks and events types for access decisions ... Browse Code »

introduce a new fsnotify hook, fsnotify_perm(), which is called from the
security code. This hook is used to allow fsnotify groups to make access
control decisions about events on the system. We also must change the
generic fsnotify function to return an error code if we intend these hooks
to be in any way useful.

Signed-off-by: Eric Paris

Eric Paris
2010-07-28 21:59:01 +0800
59b0df211 fsnotify: use unsigned char * for dentry->d_name.name ... Browse Code »

fsnotify was using char * when it passed around the d_name.name string
internally but it is actually an unsigned char *. This patch switches
fsnotify to use unsigned and should silence some pointer signess warnings
which have popped out of xfs. I do not add -Wpointer-sign to the fsnotify
code as there are still issues with kstrdup and strlen which would pop
out needless warnings.

Signed-off-by: Eric Paris

Eric Paris
2010-07-28 21:59:01 +0800
6e5f77b32 fsnotify: intoduce a notification merge argument ... Browse Code »

Each group can define their own notification (and secondary_q) merge
function. Inotify does tail drop, fanotify does matching and drop which
can actually allocate a completely new event. But for fanotify to properly
deal with permissions events it needs to know the new event which was
ultimately added to the notification queue. This patch just implements a
void ** argument which is passed to the merge function. fanotify can use
this field to pass the new event back to higher layers.

Signed-off-by: Eric Paris
for fanotify to properly deal with permissions events

Eric Paris
2010-07-28 21:59:01 +0800