09 Jan, 2017

1 commit

  • commit 5716863e0f8251d3360d4cbfc0e44e08007075df upstream.

    fsnotify_unmount_inodes() plays complex tricks to pin next inode in the
    sb->s_inodes list when iterating over all inodes. Furthermore the code has a
    bug that if the current inode is the last on i_sb_list that does not have e.g.
    I_FREEING set, then we leave next_i pointing to inode which may get removed
    from the i_sb_list once we drop s_inode_list_lock thus resulting in
    use-after-free issues (usually manifesting as infinite looping in
    fsnotify_unmount_inodes()).

    Fix the problem by keeping current inode pinned somewhat longer. Then we can
    make the code much simpler and standard.

    Signed-off-by: Jan Kara
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     

08 Oct, 2016

5 commits

  • Use assert_spin_locked() macro instead of hand-made BUG_ON statements.

    Link: http://lkml.kernel.org/r/1474537439-18919-1-git-send-email-jack@suse.cz
    Signed-off-by: Jan Kara
    Suggested-by: Heiner Kallweit
    Reviewed-by: Jeff Layton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • When freeing permission events by fsnotify_destroy_event(), the warning
    WARN_ON(!list_empty(&event->list)); may falsely hit.

    This is because although fanotify_get_response() saw event->response
    set, there is nothing to make sure the current CPU also sees the removal
    of the event from the list. Add proper locking around the WARN_ON() to
    avoid the false warning.

    Link: http://lkml.kernel.org/r/1473797711-14111-7-git-send-email-jack@suse.cz
    Reported-by: Miklos Szeredi
    Signed-off-by: Jan Kara
    Reviewed-by: Lino Sanfilippo
    Cc: Eric Paris
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Fanotify code has its own lock (access_lock) to protect a list of events
    waiting for a response from userspace.

    However this is somewhat awkward as the same list_head in the event is
    protected by notification_lock if it is part of the notification queue
    and by access_lock if it is part of the fanotify private queue which
    makes it difficult for any reliable checks in the generic code. So make
    fanotify use the same lock - notification_lock - for protecting its
    private event list.

    Link: http://lkml.kernel.org/r/1473797711-14111-6-git-send-email-jack@suse.cz
    Signed-off-by: Jan Kara
    Reviewed-by: Lino Sanfilippo
    Cc: Miklos Szeredi
    Cc: Eric Paris
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • notification_mutex is used to protect the list of pending events. As such
    there's no reason to use a sleeping lock for it. Convert it to a
    spinlock.

    [jack@suse.cz: fixed version]
    Link: http://lkml.kernel.org/r/1474031567-1831-1-git-send-email-jack@suse.cz
    Link: http://lkml.kernel.org/r/1473797711-14111-5-git-send-email-jack@suse.cz
    Signed-off-by: Jan Kara
    Reviewed-by: Lino Sanfilippo
    Tested-by: Guenter Roeck
    Cc: Miklos Szeredi
    Cc: Eric Paris
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • fsnotify_flush_notify() and fanotify_release() destroy notification
    event while holding notification_mutex.

    The destruction of fanotify event includes a path_put() call which may
    end up calling into a filesystem to delete an inode if we happen to be
    the last holders of dentry reference which happens to be the last holder
    of inode reference.

    That in turn may violate lock ordering for some filesystems since
    notification_mutex is also acquired e. g. during write when generating
    fanotify event.

    Also this is the only thing that forces notification_mutex to be a
    sleeping lock. So drop notification_mutex before destroying a
    notification event.

    Link: http://lkml.kernel.org/r/1473797711-14111-4-git-send-email-jack@suse.cz
    Signed-off-by: Jan Kara
    Cc: Miklos Szeredi
    Cc: Lino Sanfilippo
    Cc: Eric Paris
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

20 Sep, 2016

2 commits

  • fanotify_get_response() calls fsnotify_remove_event() when it finds that
    group is being released from fanotify_release() (bypass_perm is set).

    However the event it removes need not be only in the group's notification
    queue but it can have already moved to access_list (userspace read the
    event before closing the fanotify instance fd) which is protected by a
    different lock. Thus when fsnotify_remove_event() races with
    fanotify_release() operating on access_list, the list can get corrupted.

    Fix the problem by moving all the logic removing permission events from
    the lists to one place - fanotify_release().

    Fixes: 5838d4442bd5 ("fanotify: fix double free of pending permission events")
    Link: http://lkml.kernel.org/r/1473797711-14111-3-git-send-email-jack@suse.cz
    Signed-off-by: Jan Kara
    Reported-by: Miklos Szeredi
    Tested-by: Miklos Szeredi
    Reviewed-by: Miklos Szeredi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Implement a function that can be called when a group is being shutdown
    to stop queueing new events to the group. Fanotify will use this.

    Fixes: 5838d4442bd5 ("fanotify: fix double free of pending permission events")
    Link: http://lkml.kernel.org/r/1473797711-14111-2-git-send-email-jack@suse.cz
    Signed-off-by: Jan Kara
    Reviewed-by: Miklos Szeredi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

20 May, 2016

1 commit

  • Inotify instance is destroyed when all references to it are dropped.
    That not only means that the corresponding file descriptor needs to be
    closed but also that all corresponding instance marks are freed (as each
    mark holds a reference to the inotify instance). However marks are
    freed only after SRCU period ends which can take some time and thus if
    user rapidly creates and frees inotify instances, number of existing
    inotify instances can exceed max_user_instances limit although from user
    point of view there is always at most one existing instance. Thus
    inotify_init() returns EMFILE error which is hard to justify from user
    point of view. This problem is exposed by LTP inotify06 testcase on
    some machines.

    We fix the problem by making sure all group marks are properly freed
    while destroying inotify instance. We wait for SRCU period to end in
    that path anyway since we have to make sure there is no event being
    added to the instance while we are tearing down the instance. So it
    takes only some plumbing to allow for marks to be destroyed in that path
    as well and not from a dedicated work item.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Jan Kara
    Reported-by: Xiaoguang Wang
    Tested-by: Xiaoguang Wang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

19 Feb, 2016

2 commits

  • We don't require a dedicated thread for fsnotify cleanup. Switch it
    over to a workqueue job instead that runs on the system_unbound_wq.

    In the interest of not thrashing the queued job too often when there are
    a lot of marks being removed, we delay the reaper job slightly when
    queueing it, to allow several to gather on the list.

    Signed-off-by: Jeff Layton
    Tested-by: Eryu Guan
    Reviewed-by: Jan Kara
    Cc: Eric Paris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Layton
     
  • This reverts commit c510eff6beba ("fsnotify: destroy marks with
    call_srcu instead of dedicated thread").

    Eryu reported that he was seeing some OOM kills kick in when running a
    testcase that adds and removes inotify marks on a file in a tight loop.

    The above commit changed the code to use call_srcu to clean up the
    marks. While that does (in principle) work, the srcu callback job is
    limited to cleaning up entries in small batches and only once per jiffy.
    It's easily possible to overwhelm that machinery with too many call_srcu
    callbacks, and Eryu's reproduer did just that.

    There's also another potential problem with using call_srcu here. While
    you can obviously sleep while holding the srcu_read_lock, the callbacks
    run under local_bh_disable, so you can't sleep there.

    It's possible when putting the last reference to the fsnotify_mark that
    we'll end up putting a chain of references including the fsnotify_group,
    uid, and associated keys. While I don't see any obvious ways that that
    could occurs, it's probably still best to avoid using call_srcu here
    after all.

    This patch reverts the above patch. A later patch will take a different
    approach to eliminated the dedicated thread here.

    Signed-off-by: Jeff Layton
    Reported-by: Eryu Guan
    Tested-by: Eryu Guan
    Cc: Jan Kara
    Cc: Eric Paris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Layton
     

15 Jan, 2016

2 commits

  • At the time that this code was originally written, call_srcu didn't
    exist, so this thread was required to ensure that we waited for that
    SRCU grace period to settle before finally freeing the object.

    It does exist now however and we can much more efficiently use call_srcu
    to handle this. That also allows us to potentially use srcu_barrier to
    ensure that they are all of the callbacks have run before proceeding.
    In order to conserve space, we union the rcu_head with the g_list.

    This will be necessary for nfsd which will allocate marks from a
    dedicated slabcache. We have to be able to ensure that all of the
    objects are destroyed before destroying the cache. That's fairly

    Signed-off-by: Jeff Layton
    Cc: Eric Paris
    Reviewed-by: Jan Kara
    Cc: "Paul E. McKenney"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Layton
     
  • To make the intention clearer, use list_next_entry instead of
    list_entry.

    Signed-off-by: Geliang Tang
    Reviewed-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geliang Tang
     

06 Nov, 2015

2 commits

  • The comment here says that it is checking for invalid bits. But, the mask
    is *actually* checking to ensure that _any_ valid bit is set, which is
    quite different.

    Without this check, an unexpected bit could get set on an inotify object.
    Since these bits are also interpreted by the fsnotify/dnotify code, there
    is the potential for an object to be mishandled inside the kernel. For
    instance, can we be sure that setting the dnotify flag FS_DN_RENAME on an
    inotify watch is harmless?

    Add the actual check which was intended. Retain the existing inotify bits
    are being added to the watch. Plus, this is existing behavior which would
    be nice to preserve.

    I did a quick sniff test that inotify functions and that my
    'inotify-tools' package passes 'make check'.

    Signed-off-by: Dave Hansen
    Cc: John McCutchan
    Cc: Robert Love
    Cc: Eric Paris
    Cc: Josh Boyer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • There was a report that my patch:

    inotify: actually check for invalid bits in sys_inotify_add_watch()

    broke CRIU.

    The reason is that CRIU looks up raw flags in /proc/$pid/fdinfo/* to
    figure out how to rebuild inotify watches and then passes those flags
    directly back in to the inotify API. One of those flags
    (FS_EVENT_ON_CHILD) is set in mark->mask, but is not part of the inotify
    API. It is used inside the kernel to _implement_ inotify but it is not
    and has never been part of the API.

    My patch above ensured that we only allow bits which are part of the API
    (IN_ALL_EVENTS). This broke CRIU.

    FS_EVENT_ON_CHILD is really internal to the kernel. It is set _anyway_ on
    all inotify marks. So, CRIU was really just trying to set a bit that was
    already set.

    This patch hides that bit from fdinfo. CRIU will not see the bit, not try
    to set it, and should work as before. We should not have been exposing
    this bit in the first place, so this is a good patch independent of the
    CRIU problem.

    Signed-off-by: Dave Hansen
    Reported-by: Andrey Wagin
    Acked-by: Andrey Vagin
    Acked-by: Cyrill Gorcunov
    Acked-by: Eric Paris
    Cc: Pavel Emelyanov
    Cc: John McCutchan
    Cc: Robert Love
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     

06 Sep, 2015

1 commit

  • Pull vfs updates from Al Viro:
    "In this one:

    - d_move fixes (Eric Biederman)

    - UFS fixes (me; locking is mostly sane now, a bunch of bugs in error
    handling ought to be fixed)

    - switch of sb_writers to percpu rwsem (Oleg Nesterov)

    - superblock scalability (Josef Bacik and Dave Chinner)

    - swapon(2) race fix (Hugh Dickins)"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (65 commits)
    vfs: Test for and handle paths that are unreachable from their mnt_root
    dcache: Reduce the scope of i_lock in d_splice_alias
    dcache: Handle escaped paths in prepend_path
    mm: fix potential data race in SyS_swapon
    inode: don't softlockup when evicting inodes
    inode: rename i_wb_list to i_io_list
    sync: serialise per-superblock sync operations
    inode: convert inode_sb_list_lock to per-sb
    inode: add hlist_fake to avoid the inode hash lock in evict
    writeback: plug writeback at a high level
    change sb_writers to use percpu_rw_semaphore
    shift percpu_counter_destroy() into destroy_super_work()
    percpu-rwsem: kill CONFIG_PERCPU_RWSEM
    percpu-rwsem: introduce percpu_rwsem_release() and percpu_rwsem_acquire()
    percpu-rwsem: introduce percpu_down_read_trylock()
    document rwsem_release() in sb_wait_write()
    fix the broken lockdep logic in __sb_start_write()
    introduce __sb_writers_{acquired,release}() helpers
    ufs_inode_get{frag,block}(): get rid of 'phys' argument
    ufs_getfrag_block(): tidy up a bit
    ...

    Linus Torvalds
     

05 Sep, 2015

4 commits

  • fsnotify_destroy_mark_locked() is subtle to use because it temporarily
    releases group->mark_mutex. To avoid future problems with this
    function, split it into two.

    fsnotify_detach_mark() is the part that needs group->mark_mutex and
    fsnotify_free_mark() is the part that must be called outside of
    group->mark_mutex. This way it's much clearer what's going on and we
    also avoid some pointless acquisitions of group->mark_mutex.

    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Free list is used when all marks on given inode / mount should be
    destroyed when inode / mount is going away. However we can free all of
    the marks without using a special list with some care.

    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • A check in inotify_fdinfo() checking whether mark is valid was always
    true due to a bug. Luckily we can never get to invalidated marks since
    we hold mark_mutex and invalidated marks get removed from the group list
    when they are invalidated under that mutex.

    Anyway fix the check to make code more future proof.

    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • I have a _tiny_ microbenchmark that sits in a loop and writes single
    bytes to a file. Writing one byte to a tmpfs file is around 2x slower
    than reading one byte from a file, which is a _bit_ more than I expecte.
    This is a dumb benchmark, but I think it's hard to deny that write() is
    a hot path and we should avoid unnecessary overhead there.

    I did a 'perf record' of 30-second samples of read and write. The top
    item in a diffprofile is srcu_read_lock() from fsnotify(). There are
    active inotify fd's from systemd, but nothing is actually listening to
    the file or its part of the filesystem.

    I *think* we can avoid taking the srcu_read_lock() for the common case
    where there are no actual marks on the file. This means that there will
    both be nothing to notify for *and* implies that there is no need for
    clearing the ignore mask.

    This patch gave a 13.1% speedup in writes/second on my test, which is an
    improvement from the 10.8% that I saw with the last version.

    Signed-off-by: Dave Hansen
    Reviewed-by: Jan Kara
    Cc: Al Viro
    Cc: Eric Paris
    Cc: John McCutchan
    Cc: Robert Love
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     

18 Aug, 2015

1 commit

  • The process of reducing contention on per-superblock inode lists
    starts with moving the locking to match the per-superblock inode
    list. This takes the global lock out of the picture and reduces the
    contention problems to within a single filesystem. This doesn't get
    rid of contention as the locks still have global CPU scope, but it
    does isolate operations on different superblocks form each other.

    Signed-off-by: Dave Chinner
    Signed-off-by: Josef Bacik
    Reviewed-by: Jan Kara
    Reviewed-by: Christoph Hellwig
    Tested-by: Dave Chinner

    Dave Chinner
     

07 Aug, 2015

1 commit

  • fsnotify_clear_marks_by_group_flags() can race with
    fsnotify_destroy_marks() so that when fsnotify_destroy_mark_locked()
    drops mark_mutex, a mark from the list iterated by
    fsnotify_clear_marks_by_group_flags() can be freed and thus the next
    entry pointer we have cached may become stale and we dereference free
    memory.

    Fix the problem by first moving marks to free to a special private list
    and then always free the first entry in the special list. This method
    is safe even when entries from the list can disappear once we drop the
    lock.

    Signed-off-by: Jan Kara
    Reported-by: Ashish Sangwan
    Reviewed-by: Ashish Sangwan
    Cc: Lino Sanfilippo
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

22 Jul, 2015

1 commit

  • This reverts commit a2673b6e040663bf16a552f8619e6bde9f4b9acf.

    Kinglong Mee reports a memory leak with that patch, and Jan Kara confirms:

    "Thanks for report! You are right that my patch introduces a race
    between fsnotify kthread and fsnotify_destroy_group() which can result
    in leaking inotify event on group destruction.

    I haven't yet decided whether the right fix is not to queue events for
    dying notification group (as that is pointless anyway) or whether we
    should just fix the original problem differently... Whenever I look
    at fsnotify code mark handling I get lost in the maze of locks, lists,
    and subtle differences between how different notification systems
    handle notification marks :( I'll think about it over night"

    and after thinking about it, Jan says:

    "OK, I have looked into the code some more and I found another
    relatively simple way of fixing the original oops. It will be IMHO
    better than trying to fixup this issue which has more potential for
    breakage. I'll ask Linus to revert the fsnotify fix he already merged
    and send a new fix"

    Reported-by: Kinglong Mee
    Requested-by: Jan Kara
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

18 Jul, 2015

1 commit

  • fsnotify_clear_marks_by_group_flags() can race with
    fsnotify_destroy_marks() so when fsnotify_destroy_mark_locked() drops
    mark_mutex, a mark from the list iterated by
    fsnotify_clear_marks_by_group_flags() can be freed and we dereference free
    memory in the loop there.

    Fix the problem by keeping mark_mutex held in
    fsnotify_destroy_mark_locked(). The reason why we drop that mutex is that
    we need to call a ->freeing_mark() callback which may acquire mark_mutex
    again. To avoid this and similar lock inversion issues, we move the call
    to ->freeing_mark() callback to the kthread destroying the mark.

    Signed-off-by: Jan Kara
    Reported-by: Ashish Sangwan
    Suggested-by: Lino Sanfilippo
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

17 Jun, 2015

1 commit

  • The INOTIFY_USER option is bool, and hence this code is either
    present or absent. It will never be modular, so using
    module_init as an alias for __initcall is rather misleading.

    Fix this up now, so that we can relocate module_init from
    init.h into module.h in the future. If we don't do this, we'd
    have to add module.h to obviously non-modular code, and that
    would be a worse thing.

    Note that direct use of __initcall is discouraged, vs. one
    of the priority categorized subgroups. As __initcall gets
    mapped onto device_initcall, our use of fs_initcall (which
    makes sense for fs code) will thus change this registration
    from level 6-device to level 5-fs (i.e. slightly earlier).
    However no observable impact of that small difference has
    been observed during testing, or is expected.

    Cc: John McCutchan
    Cc: Robert Love
    Cc: Eric Paris
    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

13 Mar, 2015

1 commit

  • With FAN_ONDIR set, the user can end up getting events, which it hasn't
    marked. This was revealed with fanotify04 testcase failure on
    Linux-4.0-rc1, and is a regression from 3.19, revealed with 66ba93c0d7fe6
    ("fanotify: don't set FAN_ONDIR implicitly on a marks ignored mask").

    # /opt/ltp/testcases/bin/fanotify04
    [ ... ]
    fanotify04 7 TPASS : event generated properly for type 100000
    fanotify04 8 TFAIL : fanotify04.c:147: got unexpected event 30
    fanotify04 9 TPASS : No event as expected

    The testcase sets the adds the following marks : FAN_OPEN | FAN_ONDIR for
    a fanotify on a dir. Then does an open(), followed by close() of the
    directory and expects to see an event FAN_OPEN(0x20). However, the
    fanotify returns (FAN_OPEN|FAN_CLOSE_NOWRITE(0x10)). This happens due to
    the flaw in the check for event_mask in fanotify_should_send_event() which
    does:

    if (event_mask & marks_mask & ~marks_ignored_mask)
    return true;

    where, event_mask == (FAN_ONDIR | FAN_CLOSE_NOWRITE),
    marks_mask == (FAN_ONDIR | FAN_OPEN),
    marks_ignored_mask == 0

    Fix this by masking the outgoing events to the user, as we already take
    care of FAN_ONDIR and FAN_EVENT_ON_CHILD.

    Signed-off-by: Suzuki K. Poulose
    Tested-by: Lino Sanfilippo
    Reviewed-by: Jan Kara
    Cc: Eric Paris
    Cc: Will Deacon

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Suzuki K. Poulose
     

23 Feb, 2015

2 commits

  • Fanotify probably doesn't want to watch autodirs so make it use d_can_lookup()
    rather than d_is_dir() when checking a dir watch and give an error on fake
    directories.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Convert the following where appropriate:

    (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).

    (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).

    (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more
    complicated than it appears as some calls should be converted to
    d_can_lookup() instead. The difference is whether the directory in
    question is a real dir with a ->lookup op or whether it's a fake dir with
    a ->d_automount op.

    In some circumstances, we can subsume checks for dentry->d_inode not being
    NULL into this, provided we the code isn't in a filesystem that expects
    d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
    use d_inode() rather than d_backing_inode() to get the inode pointer).

    Note that the dentry type field may be set to something other than
    DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
    manages the fall-through from a negative dentry to a lower layer. In such a
    case, the dentry type of the negative union dentry is set to the same as the
    type of the lower dentry.

    However, if you know d_inode is not NULL at the call site, then you can use
    the d_is_xxx() functions even in a filesystem.

    There is one further complication: a 0,0 chardev dentry may be labelled
    DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was
    intended for special directory entry types that don't have attached inodes.

    The following perl+coccinelle script was used:

    use strict;

    my @callers;
    open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
    die "Can't grep for S_ISDIR and co. callers";
    @callers = ;
    close($fd);
    unless (@callers) {
    print "No matches\n";
    exit(0);
    }

    my @cocci = (
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISLNK(E->d_inode->i_mode)',
    '+ d_is_symlink(E)',
    '',
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISDIR(E->d_inode->i_mode)',
    '+ d_is_dir(E)',
    '',
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISREG(E->d_inode->i_mode)',
    '+ d_is_reg(E)' );

    my $coccifile = "tmp.sp.cocci";
    open($fd, ">$coccifile") || die $coccifile;
    print($fd "$_\n") || die $coccifile foreach (@cocci);
    close($fd);

    foreach my $file (@callers) {
    chomp $file;
    print "Processing ", $file, "\n";
    system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
    die "spatch failed";
    }

    [AV: overlayfs parts skipped]

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     

11 Feb, 2015

3 commits

  • Currently FAN_ONDIR is always set on a mark's ignored mask when the
    event mask is extended without FAN_MARK_ONDIR being set. This may
    result in events for directories being ignored unexpectedly for call
    sequences like

    fanotify_mark(fd, FAN_MARK_ADD, FAN_OPEN | FAN_ONDIR , AT_FDCWD, "dir");
    fanotify_mark(fd, FAN_MARK_ADD, FAN_CLOSE, AT_FDCWD, "dir");

    Also FAN_MARK_ONDIR is only honored when adding events to a mark's mask,
    but not for event removal. Fix both issues by not setting FAN_ONDIR
    implicitly on the ignore mask any more. Instead treat FAN_ONDIR as any
    other event flag and require FAN_MARK_ONDIR to be set by the user for
    both event mask and ignore mask. Furthermore take FAN_MARK_ONDIR into
    account when set for event removal.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Lino Sanfilippo
    Reviewed-by: Jan Kara
    Cc: Eric Paris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lino Sanfilippo
     
  • If removing bits from a mark's ignored mask, the concerning
    inodes/vfsmounts mask is not affected. So don't recalculate it.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Lino Sanfilippo
    Reviewed-by: Jan Kara
    Cc: Eric Paris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lino Sanfilippo
     
  • In fanotify_mark_remove_from_mask() a mark is destroyed if only one of
    both bitmasks (mask or ignored_mask) of a mark is cleared. However the
    other mask may still be set and contain information that should not be
    lost. So only destroy a mark if both masks are cleared.

    Signed-off-by: Lino Sanfilippo
    Reviewed-by: Jan Kara
    Cc: Eric Paris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lino Sanfilippo
     

21 Jan, 2015

1 commit


09 Jan, 2015

1 commit

  • As per e23738a7300a ("sched, inotify: Deal with nested sleeps").

    fanotify_read is a wait loop with sleeps in. Wait loops rely on
    task_struct::state and sleeps do too, since that's the only means of
    actually sleeping. Therefore the nested sleeps destroy the wait loop
    state and the wait loop breaks the sleep functions that assume
    TASK_RUNNING (mutex_lock).

    Fix this by using the new woken_wake_function and wait_woken() stuff,
    which registers wakeups in wait and thereby allows shrinking the
    task_state::state changes to the actual sleep part.

    Reported-by: Yuanhan Liu
    Reported-by: Sedat Dilek
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Takashi Iwai
    Cc: Al Viro
    Cc: Eric Paris
    Cc: Linus Torvalds
    Cc: Eric Paris
    Link: http://lkml.kernel.org/r/20141216152838.GZ3337@twins.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

07 Jan, 2015

1 commit

  • SRCU is not necessary to be compiled by default in all cases. For tinification
    efforts not compiling SRCU unless necessary is desirable.

    The current patch tries to make compiling SRCU optional by introducing a new
    Kconfig option CONFIG_SRCU which is selected when any of the components making
    use of SRCU are selected.

    If we do not select CONFIG_SRCU, srcu.o will not be compiled at all.

    text data bss dec hex filename
    2007 0 0 2007 7d7 kernel/rcu/srcu.o

    Size of arch/powerpc/boot/zImage changes from

    text data bss dec hex filename
    831552 64180 23944 919676 e087c arch/powerpc/boot/zImage : before
    829504 64180 23952 917636 e0084 arch/powerpc/boot/zImage : after

    so the savings are about ~2000 bytes.

    Signed-off-by: Pranith Kumar
    CC: Paul E. McKenney
    CC: Josh Triplett
    CC: Lai Jiangshan
    Signed-off-by: Paul E. McKenney
    [ paulmck: resolve conflict due to removal of arch/ia64/kvm/Kconfig. ]

    Pranith Kumar
     

14 Dec, 2014

2 commits

  • destroy_list is used to track marks which still need waiting for srcu
    period end before they can be freed. However by the time mark is added to
    destroy_list it isn't in group's list of marks anymore and thus we can
    reuse fsnotify_mark->g_list for queueing into destroy_list. This saves
    two pointers for each fsnotify_mark.

    Signed-off-by: Jan Kara
    Cc: Eric Paris
    Cc: Heinrich Schuchardt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • There's a lot of common code in inode and mount marks handling. Factor it
    out to a common helper function.

    Signed-off-by: Jan Kara
    Cc: Eric Paris
    Cc: Heinrich Schuchardt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

11 Dec, 2014

1 commit

  • Pull VFS changes from Al Viro:
    "First pile out of several (there _definitely_ will be more). Stuff in
    this one:

    - unification of d_splice_alias()/d_materialize_unique()

    - iov_iter rewrite

    - killing a bunch of ->f_path.dentry users (and f_dentry macro).

    Getting that completed will make life much simpler for
    unionmount/overlayfs, since then we'll be able to limit the places
    sensitive to file _dentry_ to reasonably few. Which allows to have
    file_inode(file) pointing to inode in a covered layer, with dentry
    pointing to (negative) dentry in union one.

    Still not complete, but much closer now.

    - crapectomy in lustre (dead code removal, mostly)

    - "let's make seq_printf return nothing" preparations

    - assorted cleanups and fixes

    There _definitely_ will be more piles"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    copy_from_iter_nocache()
    new helper: iov_iter_kvec()
    csum_and_copy_..._iter()
    iov_iter.c: handle ITER_KVEC directly
    iov_iter.c: convert copy_to_iter() to iterate_and_advance
    iov_iter.c: convert copy_from_iter() to iterate_and_advance
    iov_iter.c: get rid of bvec_copy_page_{to,from}_iter()
    iov_iter.c: convert iov_iter_zero() to iterate_and_advance
    iov_iter.c: convert iov_iter_get_pages_alloc() to iterate_all_kinds
    iov_iter.c: convert iov_iter_get_pages() to iterate_all_kinds
    iov_iter.c: convert iov_iter_npages() to iterate_all_kinds
    iov_iter.c: iterate_and_advance
    iov_iter.c: macros for iterating over iov_iter
    kill f_dentry macro
    dcache: fix kmemcheck warning in switch_names
    new helper: audit_file()
    nfsd_vfs_write(): use file_inode()
    ncpfs: use file_inode()
    kill f_dentry uses
    lockd: get rid of ->f_path.dentry->d_sb
    ...

    Linus Torvalds
     

10 Dec, 2014

1 commit

  • Pull scheduler updates from Ingo Molnar:
    "The main changes in this cycle are:

    - 'Nested Sleep Debugging', activated when CONFIG_DEBUG_ATOMIC_SLEEP=y.

    This instruments might_sleep() checks to catch places that nest
    blocking primitives - such as mutex usage in a wait loop. Such
    bugs can result in hard to debug races/hangs.

    Another category of invalid nesting that this facility will detect
    is the calling of blocking functions from within schedule() ->
    sched_submit_work() -> blk_schedule_flush_plug().

    There's some potential for false positives (if secondary blocking
    primitives themselves are not ready yet for this facility), but the
    kernel will warn once about such bugs per bootup, so the warning
    isn't much of a nuisance.

    This feature comes with a number of fixes, for problems uncovered
    with it, so no messages are expected normally.

    - Another round of sched/numa optimizations and refinements, for
    CONFIG_NUMA_BALANCING=y.

    - Another round of sched/dl fixes and refinements.

    Plus various smaller fixes and cleanups"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
    sched: Add missing rcu protection to wake_up_all_idle_cpus
    sched/deadline: Introduce start_hrtick_dl() for !CONFIG_SCHED_HRTICK
    sched/numa: Init numa balancing fields of init_task
    sched/deadline: Remove unnecessary definitions in cpudeadline.h
    sched/cpupri: Remove unnecessary definitions in cpupri.h
    sched/deadline: Fix rq->dl.pushable_tasks bug in push_dl_task()
    sched/fair: Fix stale overloaded status in the busiest group finding logic
    sched: Move p->nr_cpus_allowed check to select_task_rq()
    sched/completion: Document when to use wait_for_completion_io_*()
    sched: Update comments about CLONE_NEWUTS and CLONE_NEWIPC
    sched/fair: Kill task_struct::numa_entry and numa_group::task_list
    sched: Refactor task_struct to use numa_faults instead of numa_* pointers
    sched/deadline: Don't check CONFIG_SMP in switched_from_dl()
    sched/deadline: Reschedule from switched_from_dl() after a successful pull
    sched/deadline: Push task away if the deadline is equal to curr during wakeup
    sched/deadline: Add deadline rq status print
    sched/deadline: Fix artificial overrun introduced by yield_task_dl()
    sched/rt: Clean up check_preempt_equal_prio()
    sched/core: Use dl_bw_of() under rcu_read_lock_sched()
    sched: Check if we got a shallowest_idle_cpu before searching for least_loaded_cpu
    ...

    Linus Torvalds
     

09 Dec, 2014

1 commit


20 Nov, 2014

1 commit

  • …git/rostedt/linux-trace into for-next

    Pull the beginning of seq_file cleanup from Steven:
    "I'm looking to clean up the seq_file code and to eventually merge the
    trace_seq code with seq_file as well, since they basically do the same thing.

    Part of this process is to remove the return code of seq_printf() and friends
    as they are rather inconsistent. It is better to use the new function
    seq_has_overflowed() if you want to stop processing when the buffer
    is full. Note, if the buffer is full, the seq_file code will throw away
    the contents, allocate a bigger buffer, and then call your code again
    to fill in the data. The only thing that breaking out of the function
    early does is to save a little time which is probably never noticed.

    I started with patches from Joe Perches and modified them as well.
    There's many more places that need to be updated before we can convert
    seq_printf() and friends to return void. But this patch set introduces
    the seq_has_overflowed() and does some initial updates."

    Al Viro