21 Dec, 2012

1 commit

  • Pull filesystem notification updates from Eric Paris:
    "This pull mostly is about locking changes in the fsnotify system. By
    switching the group lock from a spin_lock() to a mutex() we can now
    hold the lock across things like iput(). This fixes a problem
    involving unmounting a fs and having inodes be busy, first pointed out
    by FAT, but reproducible with tmpfs.

    This also restores signal driven I/O for inotify, which has been
    broken since about 2.6.32."

    Ugh. I *hate* the timing of this. It was rebased after the merge
    window opened, and then left to sit with the pull request coming the day
    before the merge window closes. That's just crap. But apparently the
    patches themselves have been around for over a year, just gathering
    dust, so now it's suddenly critical.

    Fixed up semantic conflict in fs/notify/fdinfo.c as per Stephen
    Rothwell's fixes from -next.

    * 'for-next' of git://git.infradead.org/users/eparis/notify:
    inotify: automatically restart syscalls
    inotify: dont skip removal of watch descriptor if creation of ignored event failed
    fanotify: dont merge permission events
    fsnotify: make fasync generic for both inotify and fanotify
    fsnotify: change locking order
    fsnotify: dont put marks on temporary list when clearing marks by group
    fsnotify: introduce locked versions of fsnotify_add_mark() and fsnotify_remove_mark()
    fsnotify: pass group to fsnotify_destroy_mark()
    fsnotify: use a mutex instead of a spinlock to protect a groups mark list
    fanotify: add an extra flag to mark_remove_from_mask that indicates wheather a mark should be destroyed
    fsnotify: take groups mark_lock before mark lock
    fsnotify: use reference counting for groups
    fsnotify: introduce fsnotify_get_group()
    inotify, fanotify: replace fsnotify_put_group() with fsnotify_destroy_group()

    Linus Torvalds
     

18 Dec, 2012

3 commits

  • The kernel keeps FAN_MARK_IGNORED_SURV_MODIFY bit separately from
    fsnotify_mark::mask|ignored_mask thus put it in @mflags (mark flags)
    field so the user-space reader will be able to detect if such bit were
    used on mark creation procedure.

    | pos: 0
    | flags: 04002
    | fanotify flags:10 event-flags:0
    | fanotify mnt_id:12 mflags:40 mask:38 ignored_mask:40000003
    | fanotify ino:4f969 sdev:800013 mflags:0 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:69f90400c275b5b4

    Signed-off-by: Cyrill Gorcunov
    Cc: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Andrey Vagin
    Cc: Al Viro
    Cc: Alexey Dobriyan
    Cc: James Bottomley
    Cc: "Aneesh Kumar K.V"
    Cc: Matthew Helsley
    Cc: "J. Bruce Fields"
    Cc: Tvrtko Ursulin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     
  • This allow us to print out fsnotify details such as watchee inode, device,
    mask and optionally a file handle.

    For inotify objects if kernel compiled with exportfs support the output
    will be

    | pos: 0
    | flags: 02000000
    | inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:7e9e0000640d1b6d
    | inotify wd:2 ino:a111 sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:11a1000020542153
    | inotify wd:1 ino:6b149 sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:49b1060023552153

    If kernel compiled without exportfs support, the file handle
    won't be provided but inode and device only.

    | pos: 0
    | flags: 02000000
    | inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0
    | inotify wd:2 ino:a111 sdev:800013 mask:800afce ignored_mask:0
    | inotify wd:1 ino:6b149 sdev:800013 mask:800afce ignored_mask:0

    For fanotify the output is like

    | pos: 0
    | flags: 04002
    | fanotify flags:10 event-flags:0
    | fanotify mnt_id:12 mask:3b ignored_mask:0
    | fanotify ino:50205 sdev:800013 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:05020500fb1d47e7

    To minimize impact on general fsnotify code the new functionality
    is gathered in fs/notify/fdinfo.c file.

    Signed-off-by: Cyrill Gorcunov
    Acked-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Andrey Vagin
    Cc: Al Viro
    Cc: Alexey Dobriyan
    Cc: James Bottomley
    Cc: "Aneesh Kumar K.V"
    Cc: Alexey Dobriyan
    Cc: Matthew Helsley
    Cc: "J. Bruce Fields"
    Cc: "Aneesh Kumar K.V"
    Cc: Tvrtko Ursulin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     
  • Fixes following sparse warning:

    fs/notify/inode_mark.c:127:22: warning: symbol 'fsnotify_find_inode_mark_locked' was not declared. Should it be static?

    Signed-off-by: Tushar Behera
    Cc: Eric Paris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tushar Behera
     

14 Dec, 2012

1 commit

  • Pull trivial branch from Jiri Kosina:
    "Usual stuff -- comment/printk typo fixes, documentation updates, dead
    code elimination."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
    HOWTO: fix double words typo
    x86 mtrr: fix comment typo in mtrr_bp_init
    propagate name change to comments in kernel source
    doc: Update the name of profiling based on sysfs
    treewide: Fix typos in various drivers
    treewide: Fix typos in various Kconfig
    wireless: mwifiex: Fix typo in wireless/mwifiex driver
    messages: i2o: Fix typo in messages/i2o
    scripts/kernel-doc: check that non-void fcts describe their return value
    Kernel-doc: Convention: Use a "Return" section to describe return values
    radeon: Fix typo and copy/paste error in comments
    doc: Remove unnecessary declarations from Documentation/accounting/getdelays.c
    various: Fix spelling of "asynchronous" in comments.
    Fix misspellings of "whether" in comments.
    eisa: Fix spelling of "asynchronous".
    various: Fix spelling of "registered" in comments.
    doc: fix quite a few typos within Documentation
    target: iscsi: fix comment typos in target/iscsi drivers
    treewide: fix typo of "suport" in various comments and Kconfig
    treewide: fix typo of "suppport" in various comments
    ...

    Linus Torvalds
     

12 Dec, 2012

14 commits

  • We were mistakenly returning EINTR when we found an outstanding signal.
    Instead we should returen ERESTARTSYS and allow the kernel to handle
    things the right way.

    Patch-from: Oleg Nesterov
    Signed-off-by: Eric Paris

    Eric Paris
     
  • In inotify_ignored_and_remove_idr() the removal of a watch descriptor is skipped
    if the allocation of an ignored event failed and we are leaking memory (the
    watch descriptor and the mark linked to it).
    This patch ensures that the watch descriptor is removed regardless of whether
    event creation failed or not.

    Signed-off-by: Lino Sanfilippo
    Signed-off-by: Eric Paris

    Lino Sanfilippo
     
  • Boyd Yang reported a problem for the case that multiple threads of the same
    thread group are waiting for a reponse for a permission event.
    In this case it is possible that some of the threads are never woken up, even
    if the response for the event has been received
    (see http://marc.info/?l=linux-kernel&m=131822913806350&w=2).

    The reason is that we are currently merging permission events if they belong to
    the same thread group. But we are not prepared to wake up more than one waiter
    for each event. We do

    wait_event(group->fanotify_data.access_waitq, event->response ||
    atomic_read(&group->fanotify_data.bypass_perm));
    and after that
    event->response = 0;

    which is the reason that even if we woke up all waiters for the same event
    some of them may see event->response being already set 0 again, then go back to
    sleep and block forever.

    With this patch we avoid that more than one thread is waiting for a response
    by not merging permission events for the same thread group any more.

    Reported-by: Boyd Yang
    Signed-off-by: Lino Sanfilippo
    Signed-off-by: Eric Paris

    Lino Sanfilippo
     
  • inotify is supposed to support async signal notification when information
    is available on the inotify fd. This patch moves that support to generic
    fsnotify functions so it can be used by all notification mechanisms.

    Signed-off-by: Eric Paris

    Eric Paris
     
  • On Mon, Aug 01, 2011 at 04:38:22PM -0400, Eric Paris wrote:
    >
    > I finally built and tested a v3.0 kernel with these patches (I know I'm
    > SOOOOOO far behind). Not what I hoped for:
    >
    > > [ 150.937798] VFS: Busy inodes after unmount of tmpfs. Self-destruct in 5 seconds. Have a nice day...
    > > [ 150.945290] BUG: unable to handle kernel NULL pointer dereference at 0000000000000070
    > > [ 150.946012] IP: [] shmem_free_inode+0x18/0x50
    > > [ 150.946012] PGD 2bf9e067 PUD 2bf9f067 PMD 0
    > > [ 150.946012] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    > > [ 150.946012] CPU 0
    > > [ 150.946012] Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ext4 jbd2 crc16 joydev ata_piix i2c_piix4 pcspkr uinput ipv6 autofs4 usbhid [last unloaded: scsi_wait_scan]
    > > [ 150.946012]
    > > [ 150.946012] Pid: 2764, comm: syscall_thrash Not tainted 3.0.0+ #1 Red Hat KVM
    > > [ 150.946012] RIP: 0010:[] [] shmem_free_inode+0x18/0x50
    > > [ 150.946012] RSP: 0018:ffff88002c2e5df8 EFLAGS: 00010282
    > > [ 150.946012] RAX: 000000004e370d9f RBX: 0000000000000000 RCX: ffff88003a029438
    > > [ 150.946012] RDX: 0000000033630a5f RSI: 0000000000000000 RDI: ffff88003491c240
    > > [ 150.946012] RBP: ffff88002c2e5e08 R08: 0000000000000000 R09: 0000000000000000
    > > [ 150.946012] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003a029428
    > > [ 150.946012] R13: ffff88003a029428 R14: ffff88003a029428 R15: ffff88003499a610
    > > [ 150.946012] FS: 00007f5a05420700(0000) GS:ffff88003f600000(0000) knlGS:0000000000000000
    > > [ 150.946012] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    > > [ 150.946012] CR2: 0000000000000070 CR3: 000000002a662000 CR4: 00000000000006f0
    > > [ 150.946012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    > > [ 150.946012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    > > [ 150.946012] Process syscall_thrash (pid: 2764, threadinfo ffff88002c2e4000, task ffff88002bfbc760)
    > > [ 150.946012] Stack:
    > > [ 150.946012] ffff88003a029438 ffff88003a029428 ffff88002c2e5e38 ffffffff81102f76
    > > [ 150.946012] ffff88003a029438 ffff88003a029598 ffffffff8160f9c0 ffff88002c221250
    > > [ 150.946012] ffff88002c2e5e68 ffffffff8115e9be ffff88002c2e5e68 ffff88003a029438
    > > [ 150.946012] Call Trace:
    > > [ 150.946012] [] shmem_evict_inode+0x76/0x130
    > > [ 150.946012] [] evict+0x7e/0x170
    > > [ 150.946012] [] iput_final+0xd0/0x190
    > > [ 150.946012] [] iput+0x33/0x40
    > > [ 150.946012] [] fsnotify_destroy_mark_locked+0x145/0x160
    > > [ 150.946012] [] fsnotify_destroy_mark+0x36/0x50
    > > [ 150.946012] [] sys_inotify_rm_watch+0x77/0xd0
    > > [ 150.946012] [] system_call_fastpath+0x16/0x1b
    > > [ 150.946012] Code: 67 4a 00 b8 e4 ff ff ff eb aa 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 10 48 89 1c 24 4c 89 64 24 08 48 8b 9f 40 05 00 00
    > > [ 150.946012] 83 7b 70 00 74 1c 4c 8d a3 80 00 00 00 4c 89 e7 e8 d2 5d 4a
    > > [ 150.946012] RIP [] shmem_free_inode+0x18/0x50
    > > [ 150.946012] RSP
    > > [ 150.946012] CR2: 0000000000000070
    >
    > Looks at aweful lot like the problem from:
    > http://www.spinics.net/lists/linux-fsdevel/msg46101.html
    >

    I tried to reproduce this bug with your test program, but without success.
    However, if I understand correctly, this occurs since we dont hold any locks when
    we call iput() in mark_destroy(), right?
    With the patches you tested, iput() is also not called within any lock, since the
    groups mark_mutex is released temporarily before iput() is called. This is, since
    the original codes behaviour is similar.
    However since we now have a mutex as the biggest lock, we can do what you
    suggested (http://www.spinics.net/lists/linux-fsdevel/msg46107.html) and
    call iput() with the mutex held to avoid the race.
    The patch below implements this. It uses nested locking to avoid deadlock in case
    we do the final iput() on an inode which still holds marks and thus would take
    the mutex again when calling fsnotify_inode_delete() in destroy_inode().

    Signed-off-by: Lino Sanfilippo
    Signed-off-by: Eric Paris

    Lino Sanfilippo
     
  • In clear_marks_by_group_flags() the mark list of a group is iterated and the
    marks are put on a temporary list.
    Since we introduced fsnotify_destroy_mark_locked() we dont need the temp list
    any more and are able to remove the marks while the mark list is iterated and
    the mark list mutex is held.

    Signed-off-by: Lino Sanfilippo
    Signed-off-by: Eric Paris

    Lino Sanfilippo
     
  • This patch introduces fsnotify_add_mark_locked() and fsnotify_remove_mark_locked()
    which are essentially the same as fsnotify_add_mark() and fsnotify_remove_mark() but
    assume that the caller has already taken the groups mark mutex.

    Signed-off-by: Lino Sanfilippo
    Signed-off-by: Eric Paris

    Lino Sanfilippo
     
  • In fsnotify_destroy_mark() dont get the group from the passed mark anymore,
    but pass the group itself as an additional parameter to the function.

    Signed-off-by: Lino Sanfilippo
    Signed-off-by: Eric Paris

    Lino Sanfilippo
     
  • Replaces the groups mark_lock spinlock with a mutex. Using a mutex instead
    of a spinlock results in more flexibility (i.e it allows to sleep while the
    lock is held).

    Signed-off-by: Lino Sanfilippo
    Signed-off-by: Eric Paris

    Lino Sanfilippo
     
  • …ark should be destroyed

    This patch adds an extra flag to mark_remove_from_mask() to inform the caller if
    the mark should be destroyed.
    With this we dont destroy the mark implicitly in the function itself any more
    but let the caller handle it.

    Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
    Signed-off-by: Eric Paris <eparis@redhat.com>

    Lino Sanfilippo
     
  • Race-free addition and removal of a mark to a groups mark list would be easier
    if we could lock the mark list of group before we lock the specific mark.
    This patch changes the order used to add/remove marks to/from mark lists from

    1. mark->lock
    2. group->mark_lock
    3. inode->i_lock

    to

    1. group->mark_lock
    2. mark->lock
    3. inode->i_lock

    Signed-off-by: Lino Sanfilippo
    Signed-off-by: Eric Paris

    Lino Sanfilippo
     
  • Get a group ref for each mark that is added to the groups list and release that
    ref when the mark is freed in fsnotify_put_mark().
    We also use get a group reference for duplicated marks and for private event
    data.
    Now we dont free a group any more when the number of marks becomes 0 but when
    the groups ref count does. Since this will only happen when all marks are removed
    from a groups mark list, we dont have to set the groups number of marks to 1 at
    group creation.

    Beside clearing all marks in fsnotify_destroy_group() we do also flush the
    groups event queue. This is since events may hold references to groups (due to
    private event data) and we have to put those references first before we get a
    chance to put the final ref, which will result in a call to
    fsnotify_final_destroy_group().

    Signed-off-by: Lino Sanfilippo
    Signed-off-by: Eric Paris

    Lino Sanfilippo
     
  • Introduce fsnotify_get_group() which increments the reference counter of a group.

    Signed-off-by: Lino Sanfilippo
    Signed-off-by: Eric Paris

    Lino Sanfilippo
     
  • Currently in fsnotify_put_group() the ref count of a group is decremented and if
    it becomes 0 fsnotify_destroy_group() is called. Since a groups ref count is only
    at group creation set to 1 and never increased after that a call to fsnotify_put_group()
    always results in a call to fsnotify_destroy_group().
    With this patch fsnotify_destroy_group() is called directly.

    Signed-off-by: Lino Sanfilippo
    Signed-off-by: Eric Paris

    Lino Sanfilippo
     

19 Nov, 2012

3 commits


09 Nov, 2012

1 commit

  • Anders Blomdell noted in 2010 that Fanotify lost events and provided a
    test case. Eric Paris confirmed it was a bug and posted a fix to the
    list

    https://groups.google.com/forum/?fromgroups=#!topic/linux.kernel/RrJfTfyW2BE

    but never applied it. Repeated attempts over time to actually get him
    to apply it have never had a reply from anyone who has raised it

    So apply it anyway

    Signed-off-by: Alan Cox
    Reported-by: Anders Blomdell
    Cc: Eric Paris
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Paris
     

27 Sep, 2012

2 commits


23 Jul, 2012

1 commit


14 Jul, 2012

1 commit


31 May, 2012

1 commit


24 Mar, 2012

1 commit


15 Jan, 2012

1 commit

  • Removing the parent of a watched file results in "kernel BUG at
    fs/notify/mark.c:139".

    To reproduce

    add "-w /tmp/audit/dir/watched_file" to audit.rules
    rm -rf /tmp/audit/dir

    This is caused by fsnotify_destroy_mark() being called without an
    extra reference taken by the caller.

    Reported by Francesco Cosoleto here:

    https://bugzilla.novell.com/show_bug.cgi?id=689860

    Fix by removing the BUG_ON and adding a comment about not accessing mark after
    the iput.

    Signed-off-by: Miklos Szeredi
    CC: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     

04 Jan, 2012

1 commit


27 Jul, 2011

1 commit

  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     

08 Apr, 2011

1 commit


06 Apr, 2011

1 commit

  • On an error path in inotify_init1 a normal user can trigger a double
    free of struct user. This is a regression introduced by a2ae4cc9a16e
    ("inotify: stop kernel memory leak on file creation failure").

    We fix this by making sure that if a group exists the user reference is
    dropped when the group is cleaned up. We should not explictly drop the
    reference on error and also drop the reference when the group is cleaned
    up.

    The new lifetime rules are that an inotify group lives from
    inotify_new_group to the last fsnotify_put_group. Since the struct user
    and inotify_devs are directly tied to this lifetime they are only
    changed/updated in those two locations. We get rid of all special
    casing of struct user or user->inotify_devs.

    Signed-off-by: Eric Paris
    Cc: stable@kernel.org (2.6.37 and up)
    Signed-off-by: Linus Torvalds

    Eric Paris
     

31 Mar, 2011

1 commit


25 Mar, 2011

3 commits

  • All that remains of the inode_lock is protecting the inode hash list
    manipulation and traversals. Rename the inode_lock to
    inode_hash_lock to reflect it's actual function.

    Signed-off-by: Dave Chinner
    Signed-off-by: Al Viro

    Dave Chinner
     
  • Protect the per-sb inode list with a new global lock
    inode_sb_list_lock and use it to protect the list manipulations and
    traversals. This lock replaces the inode_lock as the inodes on the
    list can be validity checked while holding the inode->i_lock and
    hence the inode_lock is no longer needed to protect the list.

    Signed-off-by: Dave Chinner
    Signed-off-by: Al Viro

    Dave Chinner
     
  • Protect inode state transitions and validity checks with the
    inode->i_lock. This enables us to make inode state transitions
    independently of the inode_lock and is the first step to peeling
    away the inode_lock from the code.

    This requires that __iget() is done atomically with i_state checks
    during list traversals so that we don't race with another thread
    marking the inode I_FREEING between the state check and grabbing the
    reference.

    Also remove the unlock_new_inode() memory barrier optimisation
    required to avoid taking the inode_lock when clearing I_NEW.
    Simplify the code by simply taking the inode->i_lock around the
    state change and wakeup. Because the wakeup is no longer tricky,
    remove the wake_up_inode() function and open code the wakeup where
    necessary.

    Signed-off-by: Dave Chinner
    Signed-off-by: Al Viro

    Dave Chinner
     

01 Mar, 2011

1 commit


14 Jan, 2011

1 commit

  • * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
    Documentation/trace/events.txt: Remove obsolete sched_signal_send.
    writeback: fix global_dirty_limits comment runtime -> real-time
    ppc: fix comment typo singal -> signal
    drivers: fix comment typo diable -> disable.
    m68k: fix comment typo diable -> disable.
    wireless: comment typo fix diable -> disable.
    media: comment typo fix diable -> disable.
    remove doc for obsolete dynamic-printk kernel-parameter
    remove extraneous 'is' from Documentation/iostats.txt
    Fix spelling milisec -> ms in snd_ps3 module parameter description
    Fix spelling mistakes in comments
    Revert conflicting V4L changes
    i7core_edac: fix typos in comments
    mm/rmap.c: fix comment
    sound, ca0106: Fix assignment to 'channel'.
    hrtimer: fix a typo in comment
    init/Kconfig: fix typo
    anon_inodes: fix wrong function name in comment
    fix comment typos concerning "consistent"
    poll: fix a typo in comment
    ...

    Fix up trivial conflicts in:
    - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c)
    - fs/ext4/ext4.h

    Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.

    Linus Torvalds