10 Jul, 2013

6 commits

  • There have been changes in the locking scheme of fsnotify but the
    comments in the source code have not been updated yet. This patch
    corrects this.

    Signed-off-by: Lino Sanfilippo
    Cc: Eric Paris
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lino Sanfilippo
     
  • In inotify_new_watch() the number of watches for a group is compared
    against the max number of allowed watches and increased afterwards. The
    check and incrementation is not done atomically, so it is possible for
    multiple concurrent threads to pass the check and increment the number
    of marks above the allowed max.

    This patch uses an inotify groups mark_lock to ensure that both check
    and incrementation are done atomic. Furthermore we dont have to worry
    about the race that allows a concurrent thread to add a watch just after
    inotify_update_existing_watch() returned with -ENOENT anymore, since
    this is also synchronized by the groups mark mutex now.

    Signed-off-by: Lino Sanfilippo
    Cc: Eric Paris
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lino Sanfilippo
     
  • There is no need to use a special mutex to protect against the
    fcntl/close race (see dnotify.c for a description of this race).
    Instead the dnotify_groups mark mutex can be used.

    Signed-off-by: Lino Sanfilippo
    Cc: Eric Paris
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lino Sanfilippo
     
  • The code under the groups mark_mutex in fanotify_add_inode_mark() and
    fanotify_add_vfsmount_mark() is almost identical. So put it into a
    seperate function.

    Signed-off-by: Lino Sanfilippo
    Cc: Eric Paris
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lino Sanfilippo
     
  • For both adding an event to an existing mark and destroying a mark we
    first have to find it via fsnotify_find_[inode|vfsmount]_mark(). But
    getting the mark and adding an event (or destroying it) is not done
    atomically. This opens a race where a thread is about to destroy a mark
    while another thread still finds the same mark and adds an event to its
    mask although it will be destroyed.

    Another race exists concerning the excess of a groups number of marks
    limit: When a mark is added the number of group marks is checked against
    the max number of marks per group and increased afterwards. Since check
    and increment is also not done atomically, this may result in 2 or more
    processes passing the check at the same time and increasing the number
    of group marks above the allowed limit.

    With this patch both races are avoided by doing the concerning
    operations with the groups mark mutex locked.

    Signed-off-by: Lino Sanfilippo
    Cc: Eric Paris
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lino Sanfilippo
     
  • The ->reserved field isn't cleared so we leak one byte of stack
    information to userspace.

    Signed-off-by: Dan Carpenter
    Cc: Eric Paris
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Carpenter
     

29 Jun, 2013

1 commit


10 May, 2013

1 commit


02 May, 2013

1 commit

  • Pull VFS updates from Al Viro,

    Misc cleanups all over the place, mainly wrt /proc interfaces (switch
    create_proc_entry to proc_create(), get rid of the deprecated
    create_proc_read_entry() in favor of using proc_create_data() and
    seq_file etc).

    7kloc removed.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
    don't bother with deferred freeing of fdtables
    proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
    proc: Make the PROC_I() and PDE() macros internal to procfs
    proc: Supply a function to remove a proc entry by PDE
    take cgroup_open() and cpuset_open() to fs/proc/base.c
    ppc: Clean up scanlog
    ppc: Clean up rtas_flash driver somewhat
    hostap: proc: Use remove_proc_subtree()
    drm: proc: Use remove_proc_subtree()
    drm: proc: Use minor->index to label things, not PDE->name
    drm: Constify drm_proc_list[]
    zoran: Don't print proc_dir_entry data in debug
    reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
    proc: Supply an accessor for getting the data from a PDE's parent
    airo: Use remove_proc_subtree()
    rtl8192u: Don't need to save device proc dir PDE
    rtl8187se: Use a dir under /proc/net/r8180/
    proc: Add proc_mkdir_data()
    proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
    proc: Move PDE_NET() to fs/proc/proc_net.c
    ...

    Linus Torvalds
     

01 May, 2013

2 commits

  • Pull compat cleanup from Al Viro:
    "Mostly about syscall wrappers this time; there will be another pile
    with patches in the same general area from various people, but I'd
    rather push those after both that and vfs.git pile are in."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
    syscalls.h: slightly reduce the jungles of macros
    get rid of union semop in sys_semctl(2) arguments
    make do_mremap() static
    sparc: no need to sign-extend in sync_file_range() wrapper
    ppc compat wrappers for add_key(2) and request_key(2) are pointless
    x86: trim sys_ia32.h
    x86: sys32_kill and sys32_mprotect are pointless
    get rid of compat_sys_semctl() and friends in case of ARCH_WANT_OLD_COMPAT_IPC
    merge compat sys_ipc instances
    consolidate compat lookup_dcookie()
    convert vmsplice to COMPAT_SYSCALL_DEFINE
    switch getrusage() to COMPAT_SYSCALL_DEFINE
    switch epoll_pwait to COMPAT_SYSCALL_DEFINE
    convert sendfile{,64} to COMPAT_SYSCALL_DEFINE
    switch signalfd{,4}() to COMPAT_SYSCALL_DEFINE
    make SYSCALL_DEFINE-generated wrappers do asmlinkage_protect
    make HAVE_SYSCALL_WRAPPERS unconditional
    consolidate cond_syscall and SYSCALL_ALIAS declarations
    teach SYSCALL_DEFINE how to deal with long long/unsigned long long
    get rid of duplicate logics in __SC_....[1-6] definitions

    Linus Torvalds
     
  • When we run the crackerjack testsuite, the inotify_add_watch test is
    stalled.

    This is caused by the invalid mask 0 - the task is waiting for the event
    but it never comes. inotify_add_watch() should return -EINVAL as it did
    before commit 676a0675cf92 ("inotify: remove broken mask checks causing
    unmount to be EINVAL"). That commit removes the invalid mask check, but
    that check is needed.

    Check the mask's ALL_INOTIFY_BITS before the inotify_arg_to_mask() call.
    If none are set, just return -EINVAL.

    Because IN_UNMOUNT is in ALL_INOTIFY_BITS, this change will not trigger
    the problem that above commit fixed.

    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Zhao Hongjiang
    Acked-by: Jim Somerville
    Cc: Paul Gortmaker
    Cc: Jerome Marchand
    Cc: Eric Paris
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhao Hongjiang
     

30 Apr, 2013

2 commits


04 Mar, 2013

1 commit


28 Feb, 2013

3 commits

  • I'm not sure why, but the hlist for each entry iterators were conceived

    list_for_each_entry(pos, head, member)

    The hlist ones were greedy and wanted an extra parameter:

    hlist_for_each_entry(tpos, pos, head, member)

    Why did they need an extra pos parameter? I'm not quite sure. Not only
    they don't really need it, it also prevents the iterator from looking
    exactly like the list iterator, which is unfortunate.

    Besides the semantic patch, there was some manual work required:

    - Fix up the actual hlist iterators in linux/list.h
    - Fix up the declaration of other iterators based on the hlist ones.
    - A very small amount of places were using the 'node' parameter, this
    was modified to use 'obj->member' instead.
    - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
    properly, so those had to be fixed up manually.

    The semantic patch which is mostly the work of Peter Senna Tschudin is here:

    @@
    iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

    type T;
    expression a,c,d,e;
    identifier b;
    statement S;
    @@

    -T b;

    [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
    [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
    [akpm@linux-foundation.org: checkpatch fixes]
    [akpm@linux-foundation.org: fix warnings]
    [akpm@linux-foudnation.org: redo intrusive kvm changes]
    Tested-by: Peter Senna Tschudin
    Acked-by: Paul E. McKenney
    Signed-off-by: Sasha Levin
    Cc: Wu Fengguang
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     
  • Convert to the much saner new idr interface.

    Note that the adhoc cyclic id allocation is buggy. If wraparound
    happens, the previous code with idr_get_new_above() may segfault and
    the converted code will trigger WARN and return -EINVAL. Even if it's
    fixed to wrap to zero, the code will be prone to unnecessary -ENOSPC
    failures after the first wraparound. We probably need to implement
    proper cyclic support in idr.

    Signed-off-by: Tejun Heo
    Cc: John McCutchan
    Cc: Robert Love
    Cc: Eric Paris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • idr_destroy() can destroy idr by itself and idr_remove_all() is being
    deprecated. Drop its usage.

    Signed-off-by: Tejun Heo
    Cc: John McCutchan
    Cc: Robert Love
    Cc: Eric Paris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

27 Feb, 2013

1 commit

  • Pull vfs pile (part one) from Al Viro:
    "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
    locking violations, etc.

    The most visible changes here are death of FS_REVAL_DOT (replaced with
    "has ->d_weak_revalidate()") and a new helper getting from struct file
    to inode. Some bits of preparation to xattr method interface changes.

    Misc patches by various people sent this cycle *and* ocfs2 fixes from
    several cycles ago that should've been upstream right then.

    PS: the next vfs pile will be xattr stuff."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
    saner proc_get_inode() calling conventions
    proc: avoid extra pde_put() in proc_fill_super()
    fs: change return values from -EACCES to -EPERM
    fs/exec.c: make bprm_mm_init() static
    ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
    ocfs2: fix possible use-after-free with AIO
    ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
    get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
    target: writev() on single-element vector is pointless
    export kernel_write(), convert open-coded instances
    fs: encode_fh: return FILEID_INVALID if invalid fid_type
    kill f_vfsmnt
    vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
    nfsd: handle vfs_getattr errors in acl protocol
    switch vfs_getattr() to struct path
    default SET_PERSONALITY() in linux/elf.h
    ceph: prepopulate inodes only when request is aborted
    d_hash_and_lookup(): export, switch open-coded instances
    9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
    9p: split dropping the acls from v9fs_set_create_acl()
    ...

    Linus Torvalds
     

23 Feb, 2013

1 commit


22 Feb, 2013

1 commit

  • Running the command:

    inotifywait -e unmount /mnt/disk

    immediately aborts with a -EINVAL return code. This is however a valid
    parameter. This abort occurs only if unmount is the sole event
    parameter. If other event parameters are supplied, then the unmount
    event wait will work.

    The problem was introduced by commit 44b350fc23e ("inotify: Fix mask
    checks"). In that commit, it states:

    The mask checks in inotify_update_existing_watch() and
    inotify_new_watch() are useless because inotify_arg_to_mask()
    sets FS_IN_IGNORED and FS_EVENT_ON_CHILD bits anyway.

    But instead of removing the useless checks, it did this:

    mask = inotify_arg_to_mask(arg);
    - if (unlikely(!mask))
    + if (unlikely(!(mask & IN_ALL_EVENTS)))
    return -EINVAL;

    The problem is that IN_ALL_EVENTS doesn't include IN_UNMOUNT, and other
    parts of the code keep IN_UNMOUNT separate from IN_ALL_EVENTS. So the
    check should be:

    if (unlikely(!(mask & (IN_ALL_EVENTS | IN_UNMOUNT))))

    But inotify_arg_to_mask(arg) always sets the IN_UNMOUNT bit in the mask
    anyway, so the check is always going to pass and thus should simply be
    removed. Also note that inotify_arg_to_mask completely controls what
    mask bits get set from arg, there's no way for invalid bits to get
    enabled there.

    Lets fix it by simply removing the useless broken checks.

    Signed-off-by: Jim Somerville
    Signed-off-by: Paul Gortmaker
    Cc: Jerome Marchand
    Cc: John McCutchan
    Cc: Robert Love
    Cc: Eric Paris
    Cc: [2.6.37+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jim Somerville
     

21 Dec, 2012

1 commit

  • Pull filesystem notification updates from Eric Paris:
    "This pull mostly is about locking changes in the fsnotify system. By
    switching the group lock from a spin_lock() to a mutex() we can now
    hold the lock across things like iput(). This fixes a problem
    involving unmounting a fs and having inodes be busy, first pointed out
    by FAT, but reproducible with tmpfs.

    This also restores signal driven I/O for inotify, which has been
    broken since about 2.6.32."

    Ugh. I *hate* the timing of this. It was rebased after the merge
    window opened, and then left to sit with the pull request coming the day
    before the merge window closes. That's just crap. But apparently the
    patches themselves have been around for over a year, just gathering
    dust, so now it's suddenly critical.

    Fixed up semantic conflict in fs/notify/fdinfo.c as per Stephen
    Rothwell's fixes from -next.

    * 'for-next' of git://git.infradead.org/users/eparis/notify:
    inotify: automatically restart syscalls
    inotify: dont skip removal of watch descriptor if creation of ignored event failed
    fanotify: dont merge permission events
    fsnotify: make fasync generic for both inotify and fanotify
    fsnotify: change locking order
    fsnotify: dont put marks on temporary list when clearing marks by group
    fsnotify: introduce locked versions of fsnotify_add_mark() and fsnotify_remove_mark()
    fsnotify: pass group to fsnotify_destroy_mark()
    fsnotify: use a mutex instead of a spinlock to protect a groups mark list
    fanotify: add an extra flag to mark_remove_from_mask that indicates wheather a mark should be destroyed
    fsnotify: take groups mark_lock before mark lock
    fsnotify: use reference counting for groups
    fsnotify: introduce fsnotify_get_group()
    inotify, fanotify: replace fsnotify_put_group() with fsnotify_destroy_group()

    Linus Torvalds
     

18 Dec, 2012

3 commits

  • The kernel keeps FAN_MARK_IGNORED_SURV_MODIFY bit separately from
    fsnotify_mark::mask|ignored_mask thus put it in @mflags (mark flags)
    field so the user-space reader will be able to detect if such bit were
    used on mark creation procedure.

    | pos: 0
    | flags: 04002
    | fanotify flags:10 event-flags:0
    | fanotify mnt_id:12 mflags:40 mask:38 ignored_mask:40000003
    | fanotify ino:4f969 sdev:800013 mflags:0 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:69f90400c275b5b4

    Signed-off-by: Cyrill Gorcunov
    Cc: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Andrey Vagin
    Cc: Al Viro
    Cc: Alexey Dobriyan
    Cc: James Bottomley
    Cc: "Aneesh Kumar K.V"
    Cc: Matthew Helsley
    Cc: "J. Bruce Fields"
    Cc: Tvrtko Ursulin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     
  • This allow us to print out fsnotify details such as watchee inode, device,
    mask and optionally a file handle.

    For inotify objects if kernel compiled with exportfs support the output
    will be

    | pos: 0
    | flags: 02000000
    | inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:7e9e0000640d1b6d
    | inotify wd:2 ino:a111 sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:11a1000020542153
    | inotify wd:1 ino:6b149 sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:49b1060023552153

    If kernel compiled without exportfs support, the file handle
    won't be provided but inode and device only.

    | pos: 0
    | flags: 02000000
    | inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0
    | inotify wd:2 ino:a111 sdev:800013 mask:800afce ignored_mask:0
    | inotify wd:1 ino:6b149 sdev:800013 mask:800afce ignored_mask:0

    For fanotify the output is like

    | pos: 0
    | flags: 04002
    | fanotify flags:10 event-flags:0
    | fanotify mnt_id:12 mask:3b ignored_mask:0
    | fanotify ino:50205 sdev:800013 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:05020500fb1d47e7

    To minimize impact on general fsnotify code the new functionality
    is gathered in fs/notify/fdinfo.c file.

    Signed-off-by: Cyrill Gorcunov
    Acked-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Andrey Vagin
    Cc: Al Viro
    Cc: Alexey Dobriyan
    Cc: James Bottomley
    Cc: "Aneesh Kumar K.V"
    Cc: Alexey Dobriyan
    Cc: Matthew Helsley
    Cc: "J. Bruce Fields"
    Cc: "Aneesh Kumar K.V"
    Cc: Tvrtko Ursulin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     
  • Fixes following sparse warning:

    fs/notify/inode_mark.c:127:22: warning: symbol 'fsnotify_find_inode_mark_locked' was not declared. Should it be static?

    Signed-off-by: Tushar Behera
    Cc: Eric Paris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tushar Behera
     

14 Dec, 2012

1 commit

  • Pull trivial branch from Jiri Kosina:
    "Usual stuff -- comment/printk typo fixes, documentation updates, dead
    code elimination."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
    HOWTO: fix double words typo
    x86 mtrr: fix comment typo in mtrr_bp_init
    propagate name change to comments in kernel source
    doc: Update the name of profiling based on sysfs
    treewide: Fix typos in various drivers
    treewide: Fix typos in various Kconfig
    wireless: mwifiex: Fix typo in wireless/mwifiex driver
    messages: i2o: Fix typo in messages/i2o
    scripts/kernel-doc: check that non-void fcts describe their return value
    Kernel-doc: Convention: Use a "Return" section to describe return values
    radeon: Fix typo and copy/paste error in comments
    doc: Remove unnecessary declarations from Documentation/accounting/getdelays.c
    various: Fix spelling of "asynchronous" in comments.
    Fix misspellings of "whether" in comments.
    eisa: Fix spelling of "asynchronous".
    various: Fix spelling of "registered" in comments.
    doc: fix quite a few typos within Documentation
    target: iscsi: fix comment typos in target/iscsi drivers
    treewide: fix typo of "suport" in various comments and Kconfig
    treewide: fix typo of "suppport" in various comments
    ...

    Linus Torvalds
     

12 Dec, 2012

14 commits

  • We were mistakenly returning EINTR when we found an outstanding signal.
    Instead we should returen ERESTARTSYS and allow the kernel to handle
    things the right way.

    Patch-from: Oleg Nesterov
    Signed-off-by: Eric Paris

    Eric Paris
     
  • In inotify_ignored_and_remove_idr() the removal of a watch descriptor is skipped
    if the allocation of an ignored event failed and we are leaking memory (the
    watch descriptor and the mark linked to it).
    This patch ensures that the watch descriptor is removed regardless of whether
    event creation failed or not.

    Signed-off-by: Lino Sanfilippo
    Signed-off-by: Eric Paris

    Lino Sanfilippo
     
  • Boyd Yang reported a problem for the case that multiple threads of the same
    thread group are waiting for a reponse for a permission event.
    In this case it is possible that some of the threads are never woken up, even
    if the response for the event has been received
    (see http://marc.info/?l=linux-kernel&m=131822913806350&w=2).

    The reason is that we are currently merging permission events if they belong to
    the same thread group. But we are not prepared to wake up more than one waiter
    for each event. We do

    wait_event(group->fanotify_data.access_waitq, event->response ||
    atomic_read(&group->fanotify_data.bypass_perm));
    and after that
    event->response = 0;

    which is the reason that even if we woke up all waiters for the same event
    some of them may see event->response being already set 0 again, then go back to
    sleep and block forever.

    With this patch we avoid that more than one thread is waiting for a response
    by not merging permission events for the same thread group any more.

    Reported-by: Boyd Yang
    Signed-off-by: Lino Sanfilippo
    Signed-off-by: Eric Paris

    Lino Sanfilippo
     
  • inotify is supposed to support async signal notification when information
    is available on the inotify fd. This patch moves that support to generic
    fsnotify functions so it can be used by all notification mechanisms.

    Signed-off-by: Eric Paris

    Eric Paris
     
  • On Mon, Aug 01, 2011 at 04:38:22PM -0400, Eric Paris wrote:
    >
    > I finally built and tested a v3.0 kernel with these patches (I know I'm
    > SOOOOOO far behind). Not what I hoped for:
    >
    > > [ 150.937798] VFS: Busy inodes after unmount of tmpfs. Self-destruct in 5 seconds. Have a nice day...
    > > [ 150.945290] BUG: unable to handle kernel NULL pointer dereference at 0000000000000070
    > > [ 150.946012] IP: [] shmem_free_inode+0x18/0x50
    > > [ 150.946012] PGD 2bf9e067 PUD 2bf9f067 PMD 0
    > > [ 150.946012] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    > > [ 150.946012] CPU 0
    > > [ 150.946012] Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ext4 jbd2 crc16 joydev ata_piix i2c_piix4 pcspkr uinput ipv6 autofs4 usbhid [last unloaded: scsi_wait_scan]
    > > [ 150.946012]
    > > [ 150.946012] Pid: 2764, comm: syscall_thrash Not tainted 3.0.0+ #1 Red Hat KVM
    > > [ 150.946012] RIP: 0010:[] [] shmem_free_inode+0x18/0x50
    > > [ 150.946012] RSP: 0018:ffff88002c2e5df8 EFLAGS: 00010282
    > > [ 150.946012] RAX: 000000004e370d9f RBX: 0000000000000000 RCX: ffff88003a029438
    > > [ 150.946012] RDX: 0000000033630a5f RSI: 0000000000000000 RDI: ffff88003491c240
    > > [ 150.946012] RBP: ffff88002c2e5e08 R08: 0000000000000000 R09: 0000000000000000
    > > [ 150.946012] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003a029428
    > > [ 150.946012] R13: ffff88003a029428 R14: ffff88003a029428 R15: ffff88003499a610
    > > [ 150.946012] FS: 00007f5a05420700(0000) GS:ffff88003f600000(0000) knlGS:0000000000000000
    > > [ 150.946012] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    > > [ 150.946012] CR2: 0000000000000070 CR3: 000000002a662000 CR4: 00000000000006f0
    > > [ 150.946012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    > > [ 150.946012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    > > [ 150.946012] Process syscall_thrash (pid: 2764, threadinfo ffff88002c2e4000, task ffff88002bfbc760)
    > > [ 150.946012] Stack:
    > > [ 150.946012] ffff88003a029438 ffff88003a029428 ffff88002c2e5e38 ffffffff81102f76
    > > [ 150.946012] ffff88003a029438 ffff88003a029598 ffffffff8160f9c0 ffff88002c221250
    > > [ 150.946012] ffff88002c2e5e68 ffffffff8115e9be ffff88002c2e5e68 ffff88003a029438
    > > [ 150.946012] Call Trace:
    > > [ 150.946012] [] shmem_evict_inode+0x76/0x130
    > > [ 150.946012] [] evict+0x7e/0x170
    > > [ 150.946012] [] iput_final+0xd0/0x190
    > > [ 150.946012] [] iput+0x33/0x40
    > > [ 150.946012] [] fsnotify_destroy_mark_locked+0x145/0x160
    > > [ 150.946012] [] fsnotify_destroy_mark+0x36/0x50
    > > [ 150.946012] [] sys_inotify_rm_watch+0x77/0xd0
    > > [ 150.946012] [] system_call_fastpath+0x16/0x1b
    > > [ 150.946012] Code: 67 4a 00 b8 e4 ff ff ff eb aa 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 10 48 89 1c 24 4c 89 64 24 08 48 8b 9f 40 05 00 00
    > > [ 150.946012] 83 7b 70 00 74 1c 4c 8d a3 80 00 00 00 4c 89 e7 e8 d2 5d 4a
    > > [ 150.946012] RIP [] shmem_free_inode+0x18/0x50
    > > [ 150.946012] RSP
    > > [ 150.946012] CR2: 0000000000000070
    >
    > Looks at aweful lot like the problem from:
    > http://www.spinics.net/lists/linux-fsdevel/msg46101.html
    >

    I tried to reproduce this bug with your test program, but without success.
    However, if I understand correctly, this occurs since we dont hold any locks when
    we call iput() in mark_destroy(), right?
    With the patches you tested, iput() is also not called within any lock, since the
    groups mark_mutex is released temporarily before iput() is called. This is, since
    the original codes behaviour is similar.
    However since we now have a mutex as the biggest lock, we can do what you
    suggested (http://www.spinics.net/lists/linux-fsdevel/msg46107.html) and
    call iput() with the mutex held to avoid the race.
    The patch below implements this. It uses nested locking to avoid deadlock in case
    we do the final iput() on an inode which still holds marks and thus would take
    the mutex again when calling fsnotify_inode_delete() in destroy_inode().

    Signed-off-by: Lino Sanfilippo
    Signed-off-by: Eric Paris

    Lino Sanfilippo
     
  • In clear_marks_by_group_flags() the mark list of a group is iterated and the
    marks are put on a temporary list.
    Since we introduced fsnotify_destroy_mark_locked() we dont need the temp list
    any more and are able to remove the marks while the mark list is iterated and
    the mark list mutex is held.

    Signed-off-by: Lino Sanfilippo
    Signed-off-by: Eric Paris

    Lino Sanfilippo
     
  • This patch introduces fsnotify_add_mark_locked() and fsnotify_remove_mark_locked()
    which are essentially the same as fsnotify_add_mark() and fsnotify_remove_mark() but
    assume that the caller has already taken the groups mark mutex.

    Signed-off-by: Lino Sanfilippo
    Signed-off-by: Eric Paris

    Lino Sanfilippo
     
  • In fsnotify_destroy_mark() dont get the group from the passed mark anymore,
    but pass the group itself as an additional parameter to the function.

    Signed-off-by: Lino Sanfilippo
    Signed-off-by: Eric Paris

    Lino Sanfilippo
     
  • Replaces the groups mark_lock spinlock with a mutex. Using a mutex instead
    of a spinlock results in more flexibility (i.e it allows to sleep while the
    lock is held).

    Signed-off-by: Lino Sanfilippo
    Signed-off-by: Eric Paris

    Lino Sanfilippo
     
  • …ark should be destroyed

    This patch adds an extra flag to mark_remove_from_mask() to inform the caller if
    the mark should be destroyed.
    With this we dont destroy the mark implicitly in the function itself any more
    but let the caller handle it.

    Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
    Signed-off-by: Eric Paris <eparis@redhat.com>

    Lino Sanfilippo
     
  • Race-free addition and removal of a mark to a groups mark list would be easier
    if we could lock the mark list of group before we lock the specific mark.
    This patch changes the order used to add/remove marks to/from mark lists from

    1. mark->lock
    2. group->mark_lock
    3. inode->i_lock

    to

    1. group->mark_lock
    2. mark->lock
    3. inode->i_lock

    Signed-off-by: Lino Sanfilippo
    Signed-off-by: Eric Paris

    Lino Sanfilippo
     
  • Get a group ref for each mark that is added to the groups list and release that
    ref when the mark is freed in fsnotify_put_mark().
    We also use get a group reference for duplicated marks and for private event
    data.
    Now we dont free a group any more when the number of marks becomes 0 but when
    the groups ref count does. Since this will only happen when all marks are removed
    from a groups mark list, we dont have to set the groups number of marks to 1 at
    group creation.

    Beside clearing all marks in fsnotify_destroy_group() we do also flush the
    groups event queue. This is since events may hold references to groups (due to
    private event data) and we have to put those references first before we get a
    chance to put the final ref, which will result in a call to
    fsnotify_final_destroy_group().

    Signed-off-by: Lino Sanfilippo
    Signed-off-by: Eric Paris

    Lino Sanfilippo
     
  • Introduce fsnotify_get_group() which increments the reference counter of a group.

    Signed-off-by: Lino Sanfilippo
    Signed-off-by: Eric Paris

    Lino Sanfilippo
     
  • Currently in fsnotify_put_group() the ref count of a group is decremented and if
    it becomes 0 fsnotify_destroy_group() is called. Since a groups ref count is only
    at group creation set to 1 and never increased after that a call to fsnotify_put_group()
    always results in a call to fsnotify_destroy_group().
    With this patch fsnotify_destroy_group() is called directly.

    Signed-off-by: Lino Sanfilippo
    Signed-off-by: Eric Paris

    Lino Sanfilippo
     

19 Nov, 2012

1 commit