Eric Lee / smarc-fsl-linux-kernel

18 Jul, 2015

1 commit

a2673b6e0 fsnotify: fix oops in fsnotify_clear_marks_by_group_flags() ... Browse Code »

fsnotify_clear_marks_by_group_flags() can race with
fsnotify_destroy_marks() so when fsnotify_destroy_mark_locked() drops
mark_mutex, a mark from the list iterated by
fsnotify_clear_marks_by_group_flags() can be freed and we dereference free
memory in the loop there.

Fix the problem by keeping mark_mutex held in
fsnotify_destroy_mark_locked(). The reason why we drop that mutex is that
we need to call a ->freeing_mark() callback which may acquire mark_mutex
again. To avoid this and similar lock inversion issues, we move the call
to ->freeing_mark() callback to the kthread destroying the mark.

Signed-off-by: Jan Kara
Reported-by: Ashish Sangwan
Suggested-by: Lino Sanfilippo
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2015-07-18 07:39:54 +0800

17 Jun, 2015

1 commit

c013d5a45 fs/notify: don't use module_init for non-modular inotify_user code ... Browse Code »

The INOTIFY_USER option is bool, and hence this code is either
present or absent. It will never be modular, so using
module_init as an alias for __initcall is rather misleading.

Fix this up now, so that we can relocate module_init from
init.h into module.h in the future. If we don't do this, we'd
have to add module.h to obviously non-modular code, and that
would be a worse thing.

Note that direct use of __initcall is discouraged, vs. one
of the priority categorized subgroups. As __initcall gets
mapped onto device_initcall, our use of fs_initcall (which
makes sense for fs code) will thus change this registration
from level 6-device to level 5-fs (i.e. slightly earlier).
However no observable impact of that small difference has
been observed during testing, or is expected.

Cc: John McCutchan
Cc: Robert Love
Cc: Eric Paris
Signed-off-by: Paul Gortmaker

Paul Gortmaker
2015-06-17 02:12:34 +0800

13 Mar, 2015

1 commit

b3c1030d5 fanotify: fix event filtering with FAN_ONDIR set ... Browse Code »

With FAN_ONDIR set, the user can end up getting events, which it hasn't
marked. This was revealed with fanotify04 testcase failure on
Linux-4.0-rc1, and is a regression from 3.19, revealed with 66ba93c0d7fe6
("fanotify: don't set FAN_ONDIR implicitly on a marks ignored mask").

# /opt/ltp/testcases/bin/fanotify04
[ ... ]
fanotify04 7 TPASS : event generated properly for type 100000
fanotify04 8 TFAIL : fanotify04.c:147: got unexpected event 30
fanotify04 9 TPASS : No event as expected

The testcase sets the adds the following marks : FAN_OPEN | FAN_ONDIR for
a fanotify on a dir. Then does an open(), followed by close() of the
directory and expects to see an event FAN_OPEN(0x20). However, the
fanotify returns (FAN_OPEN|FAN_CLOSE_NOWRITE(0x10)). This happens due to
the flaw in the check for event_mask in fanotify_should_send_event() which
does:

if (event_mask & marks_mask & ~marks_ignored_mask)
return true;

where, event_mask == (FAN_ONDIR | FAN_CLOSE_NOWRITE),
marks_mask == (FAN_ONDIR | FAN_OPEN),
marks_ignored_mask == 0

Fix this by masking the outgoing events to the user, as we already take
care of FAN_ONDIR and FAN_EVENT_ON_CHILD.

Signed-off-by: Suzuki K. Poulose
Tested-by: Lino Sanfilippo
Reviewed-by: Jan Kara
Cc: Eric Paris
Cc: Will Deacon

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Suzuki K. Poulose
2015-03-13 09:46:08 +0800

23 Feb, 2015

2 commits

54f2a2f42 fanotify: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions ... Browse Code »

Fanotify probably doesn't want to watch autodirs so make it use d_can_lookup()
rather than d_is_dir() when checking a dir watch and give an error on fake
directories.

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2015-02-23 00:38:42 +0800
e36cb0b89 VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) ... Browse Code »

Convert the following where appropriate:

(1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).

(2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).

(3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more
complicated than it appears as some calls should be converted to
d_can_lookup() instead. The difference is whether the directory in
question is a real dir with a ->lookup op or whether it's a fake dir with
a ->d_automount op.

In some circumstances, we can subsume checks for dentry->d_inode not being
NULL into this, provided we the code isn't in a filesystem that expects
d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
use d_inode() rather than d_backing_inode() to get the inode pointer).

Note that the dentry type field may be set to something other than
DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
manages the fall-through from a negative dentry to a lower layer. In such a
case, the dentry type of the negative union dentry is set to the same as the
type of the lower dentry.

However, if you know d_inode is not NULL at the call site, then you can use
the d_is_xxx() functions even in a filesystem.

There is one further complication: a 0,0 chardev dentry may be labelled
DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was
intended for special directory entry types that don't have attached inodes.

The following perl+coccinelle script was used:

use strict;

my @callers;
open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
die "Can't grep for S_ISDIR and co. callers";
@callers = ;
close($fd);
unless (@callers) {
print "No matches\n";
exit(0);
}

my @cocci = (
'@@',
'expression E;',
'@@',
'',
'- S_ISLNK(E->d_inode->i_mode)',
'+ d_is_symlink(E)',
'',
'@@',
'expression E;',
'@@',
'',
'- S_ISDIR(E->d_inode->i_mode)',
'+ d_is_dir(E)',
'',
'@@',
'expression E;',
'@@',
'',
'- S_ISREG(E->d_inode->i_mode)',
'+ d_is_reg(E)' );

my $coccifile = "tmp.sp.cocci";
open($fd, ">$coccifile") || die $coccifile;
print($fd "$_\n") || die $coccifile foreach (@cocci);
close($fd);

foreach my $file (@callers) {
chomp $file;
print "Processing ", $file, "\n";
system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
die "spatch failed";
}

[AV: overlayfs parts skipped]

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2015-02-23 00:38:41 +0800

11 Feb, 2015

3 commits

66ba93c0d fanotify: don't set FAN_ONDIR implicitly on a marks ignored mask ... Browse Code »

Currently FAN_ONDIR is always set on a mark's ignored mask when the
event mask is extended without FAN_MARK_ONDIR being set. This may
result in events for directories being ignored unexpectedly for call
sequences like

fanotify_mark(fd, FAN_MARK_ADD, FAN_OPEN | FAN_ONDIR , AT_FDCWD, "dir");
fanotify_mark(fd, FAN_MARK_ADD, FAN_CLOSE, AT_FDCWD, "dir");

Also FAN_MARK_ONDIR is only honored when adding events to a mark's mask,
but not for event removal. Fix both issues by not setting FAN_ONDIR
implicitly on the ignore mask any more. Instead treat FAN_ONDIR as any
other event flag and require FAN_MARK_ONDIR to be set by the user for
both event mask and ignore mask. Furthermore take FAN_MARK_ONDIR into
account when set for event removal.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Lino Sanfilippo
Reviewed-by: Jan Kara
Cc: Eric Paris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lino Sanfilippo
2015-02-11 06:30:28 +0800
d2c1874ce fanotify: don't recalculate a marks mask if only the ignored mask changed ... Browse Code »

If removing bits from a mark's ignored mask, the concerning
inodes/vfsmounts mask is not affected. So don't recalculate it.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Lino Sanfilippo
Reviewed-by: Jan Kara
Cc: Eric Paris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lino Sanfilippo
2015-02-11 06:30:28 +0800
a118449a7 fanotify: only destroy mark when both mask and ignored_mask are cleared ... Browse Code »

In fanotify_mark_remove_from_mask() a mark is destroyed if only one of
both bitmasks (mask or ignored_mask) of a mark is cleared. However the
other mask may still be set and contain information that should not be
lost. So only destroy a mark if both masks are cleared.

Signed-off-by: Lino Sanfilippo
Reviewed-by: Jan Kara
Cc: Eric Paris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lino Sanfilippo
2015-02-11 06:30:28 +0800

21 Jan, 2015

1 commit

f49028292 Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmc… ... Browse Code »

…k/linux-rcu into core/rcu

Pull RCU updates from Paul E. McKenney:

- Documentation updates.

- Miscellaneous fixes.

- Preemptible-RCU fixes, including fixing an old bug in the
interaction of RCU priority boosting and CPU hotplug.

- SRCU updates.

- RCU CPU stall-warning updates.

- RCU torture-test updates.

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Ingo Molnar
2015-01-21 13:12:21 +0800

09 Jan, 2015

1 commit

536ebe9ca sched, fanotify: Deal with nested sleeps ... Browse Code »

As per e23738a7300a ("sched, inotify: Deal with nested sleeps").

fanotify_read is a wait loop with sleeps in. Wait loops rely on
task_struct::state and sleeps do too, since that's the only means of
actually sleeping. Therefore the nested sleeps destroy the wait loop
state and the wait loop breaks the sleep functions that assume
TASK_RUNNING (mutex_lock).

Fix this by using the new woken_wake_function and wait_woken() stuff,
which registers wakeups in wait and thereby allows shrinking the
task_state::state changes to the actual sleep part.

Reported-by: Yuanhan Liu
Reported-by: Sedat Dilek
Signed-off-by: Peter Zijlstra (Intel)
Cc: Takashi Iwai
Cc: Al Viro
Cc: Eric Paris
Cc: Linus Torvalds
Cc: Eric Paris
Link: http://lkml.kernel.org/r/20141216152838.GZ3337@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar

Peter Zijlstra
2015-01-09 18:18:12 +0800

07 Jan, 2015

1 commit

83fe27ea5 rcu: Make SRCU optional by using CONFIG_SRCU ... Browse Code »

SRCU is not necessary to be compiled by default in all cases. For tinification
efforts not compiling SRCU unless necessary is desirable.

The current patch tries to make compiling SRCU optional by introducing a new
Kconfig option CONFIG_SRCU which is selected when any of the components making
use of SRCU are selected.

If we do not select CONFIG_SRCU, srcu.o will not be compiled at all.

text data bss dec hex filename
2007 0 0 2007 7d7 kernel/rcu/srcu.o

Size of arch/powerpc/boot/zImage changes from

text data bss dec hex filename
831552 64180 23944 919676 e087c arch/powerpc/boot/zImage : before
829504 64180 23952 917636 e0084 arch/powerpc/boot/zImage : after

so the savings are about ~2000 bytes.

Signed-off-by: Pranith Kumar
CC: Paul E. McKenney
CC: Josh Triplett
CC: Lai Jiangshan
Signed-off-by: Paul E. McKenney
[ paulmck: resolve conflict due to removal of arch/ia64/kvm/Kconfig. ]

Pranith Kumar
2015-01-07 03:04:29 +0800

14 Dec, 2014

2 commits

37d469e76 fsnotify: remove destroy_list from fsnotify_mark ... Browse Code »

destroy_list is used to track marks which still need waiting for srcu
period end before they can be freed. However by the time mark is added to
destroy_list it isn't in group's list of marks anymore and thus we can
reuse fsnotify_mark->g_list for queueing into destroy_list. This saves
two pointers for each fsnotify_mark.

Signed-off-by: Jan Kara
Cc: Eric Paris
Cc: Heinrich Schuchardt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2014-12-14 04:42:53 +0800
0809ab69a fsnotify: unify inode and mount marks handling ... Browse Code »

There's a lot of common code in inode and mount marks handling. Factor it
out to a common helper function.

Signed-off-by: Jan Kara
Cc: Eric Paris
Cc: Heinrich Schuchardt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2014-12-14 04:42:53 +0800

11 Dec, 2014

1 commit

cbfe0de30 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull VFS changes from Al Viro:
"First pile out of several (there _definitely_ will be more). Stuff in
this one:

- unification of d_splice_alias()/d_materialize_unique()

- iov_iter rewrite

- killing a bunch of ->f_path.dentry users (and f_dentry macro).

Getting that completed will make life much simpler for
unionmount/overlayfs, since then we'll be able to limit the places
sensitive to file _dentry_ to reasonably few. Which allows to have
file_inode(file) pointing to inode in a covered layer, with dentry
pointing to (negative) dentry in union one.

Still not complete, but much closer now.

- crapectomy in lustre (dead code removal, mostly)

- "let's make seq_printf return nothing" preparations

- assorted cleanups and fixes

There _definitely_ will be more piles"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
copy_from_iter_nocache()
new helper: iov_iter_kvec()
csum_and_copy_..._iter()
iov_iter.c: handle ITER_KVEC directly
iov_iter.c: convert copy_to_iter() to iterate_and_advance
iov_iter.c: convert copy_from_iter() to iterate_and_advance
iov_iter.c: get rid of bvec_copy_page_{to,from}_iter()
iov_iter.c: convert iov_iter_zero() to iterate_and_advance
iov_iter.c: convert iov_iter_get_pages_alloc() to iterate_all_kinds
iov_iter.c: convert iov_iter_get_pages() to iterate_all_kinds
iov_iter.c: convert iov_iter_npages() to iterate_all_kinds
iov_iter.c: iterate_and_advance
iov_iter.c: macros for iterating over iov_iter
kill f_dentry macro
dcache: fix kmemcheck warning in switch_names
new helper: audit_file()
nfsd_vfs_write(): use file_inode()
ncpfs: use file_inode()
kill f_dentry uses
lockd: get rid of ->f_path.dentry->d_sb
...

Linus Torvalds
2014-12-11 08:10:49 +0800

10 Dec, 2014

1 commit

86c6a2fdd Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle are:

- 'Nested Sleep Debugging', activated when CONFIG_DEBUG_ATOMIC_SLEEP=y.

This instruments might_sleep() checks to catch places that nest
blocking primitives - such as mutex usage in a wait loop. Such
bugs can result in hard to debug races/hangs.

Another category of invalid nesting that this facility will detect
is the calling of blocking functions from within schedule() ->
sched_submit_work() -> blk_schedule_flush_plug().

There's some potential for false positives (if secondary blocking
primitives themselves are not ready yet for this facility), but the
kernel will warn once about such bugs per bootup, so the warning
isn't much of a nuisance.

This feature comes with a number of fixes, for problems uncovered
with it, so no messages are expected normally.

- Another round of sched/numa optimizations and refinements, for
CONFIG_NUMA_BALANCING=y.

- Another round of sched/dl fixes and refinements.

Plus various smaller fixes and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
sched: Add missing rcu protection to wake_up_all_idle_cpus
sched/deadline: Introduce start_hrtick_dl() for !CONFIG_SCHED_HRTICK
sched/numa: Init numa balancing fields of init_task
sched/deadline: Remove unnecessary definitions in cpudeadline.h
sched/cpupri: Remove unnecessary definitions in cpupri.h
sched/deadline: Fix rq->dl.pushable_tasks bug in push_dl_task()
sched/fair: Fix stale overloaded status in the busiest group finding logic
sched: Move p->nr_cpus_allowed check to select_task_rq()
sched/completion: Document when to use wait_for_completion_io_*()
sched: Update comments about CLONE_NEWUTS and CLONE_NEWIPC
sched/fair: Kill task_struct::numa_entry and numa_group::task_list
sched: Refactor task_struct to use numa_faults instead of numa_* pointers
sched/deadline: Don't check CONFIG_SMP in switched_from_dl()
sched/deadline: Reschedule from switched_from_dl() after a successful pull
sched/deadline: Push task away if the deadline is equal to curr during wakeup
sched/deadline: Add deadline rq status print
sched/deadline: Fix artificial overrun introduced by yield_task_dl()
sched/rt: Clean up check_preempt_equal_prio()
sched/core: Use dl_bw_of() under rcu_read_lock_sched()
sched: Check if we got a shallowest_idle_cpu before searching for least_loaded_cpu
...

Linus Torvalds
2014-12-10 13:21:34 +0800

09 Dec, 2014

1 commit

ba00410b8 Merge branch 'iov_iter' into for-next Browse Code »

Al Viro
2014-12-09 09:39:29 +0800

20 Nov, 2014

1 commit

8ce74dd60 Merge tag 'trace-seq-file-cleanup' of git://git.kernel.org/pub/scm/linux/kernel/… ... Browse Code »

…git/rostedt/linux-trace into for-next

Pull the beginning of seq_file cleanup from Steven:
"I'm looking to clean up the seq_file code and to eventually merge the
trace_seq code with seq_file as well, since they basically do the same thing.

Part of this process is to remove the return code of seq_printf() and friends
as they are rather inconsistent. It is better to use the new function
seq_has_overflowed() if you want to stop processing when the buffer
is full. Note, if the buffer is full, the seq_file code will throw away
the contents, allocate a bigger buffer, and then call your code again
to fill in the data. The only thing that breaking out of the function
early does is to save a little time which is probably never noticed.

I started with patches from Joe Perches and modified them as well.
There's many more places that need to be updated before we can convert
seq_printf() and friends to return void. But this patch set introduces
the seq_has_overflowed() and does some initial updates."

Al Viro
2014-11-20 02:02:53 +0800

16 Nov, 2014

1 commit

e9ac5f0fa Merge branch 'sched/urgent' into sched/core, to pick up fixes before applying more changes ... Browse Code »

Signed-off-by: Ingo Molnar

Ingo Molnar
2014-11-16 17:50:25 +0800

14 Nov, 2014

1 commit

8edc6e168 fanotify: fix notification of groups with inode & mount marks ... Browse Code »

fsnotify() needs to merge inode and mount marks lists when notifying
groups about events so that ignore masks from inode marks are reflected
in mount mark notifications and groups are notified in proper order
(according to priorities).

Currently the sorting of the lists done by fsnotify_add_inode_mark() /
fsnotify_add_vfsmount_mark() and fsnotify() differed which resulted
ignore masks not being used in some cases.

Fix the problem by always using the same comparison function when
sorting / merging the mark lists.

Thanks to Heinrich Schuchardt for improvements of my patch.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=87721
Signed-off-by: Jan Kara
Reported-by: Heinrich Schuchardt
Tested-by: Heinrich Schuchardt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2014-11-14 08:17:06 +0800

06 Nov, 2014

1 commit

a3816ab0e fs: Convert show_fdinfo functions to void ... Browse Code »

seq_printf functions shouldn't really check the return value.
Checking seq_has_overflowed() occasionally is used instead.

Update vfs documentation.

Link: http://lkml.kernel.org/p/e37e6e7b76acbdcc3bb4ab2a57c8f8ca1ae11b9a.1412031505.git.joe@perches.com

Cc: David S. Miller
Cc: Al Viro
Signed-off-by: Joe Perches
[ did a few clean ups ]
Signed-off-by: Steven Rostedt

Joe Perches
2014-11-06 03:13:23 +0800

04 Nov, 2014

1 commit

946e51f2b move d_rcu from overlapping d_child to overlapping d_alias ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2014-11-04 04:20:29 +0800

30 Oct, 2014

1 commit

6424babfd fsnotify: next_i is freed during fsnotify_unmount_inodes. ... Browse Code »

During file system stress testing on 3.10 and 3.12 based kernels, the
umount command occasionally hung in fsnotify_unmount_inodes in the
section of code:

spin_lock(&inode->i_lock);
if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) {
spin_unlock(&inode->i_lock);
continue;
}

As this section of code holds the global inode_sb_list_lock, eventually
the system hangs trying to acquire the lock.

Multiple crash dumps showed:

The inode->i_state == 0x60 and i_count == 0 and i_sb_list would point
back at itself. As this is not the value of list upon entry to the
function, the kernel never exits the loop.

To help narrow down problem, the call to list_del_init in
inode_sb_list_del was changed to list_del. This poisons the pointers in
the i_sb_list and causes a kernel to panic if it transverse a freed
inode.

Subsequent stress testing paniced in fsnotify_unmount_inodes at the
bottom of the list_for_each_entry_safe loop showing next_i had become
free.

We believe the root cause of the problem is that next_i is being freed
during the window of time that the list_for_each_entry_safe loop
temporarily releases inode_sb_list_lock to call fsnotify and
fsnotify_inode_delete.

The code in fsnotify_unmount_inodes attempts to prevent the freeing of
inode and next_i by calling __iget. However, the code doesn't do the
__iget call on next_i

if i_count == 0 or
if i_state & (I_FREEING | I_WILL_FREE)

The patch addresses this issue by advancing next_i in the above two cases
until we either find a next_i which we can __iget or we reach the end of
the list. This makes the handling of next_i more closely match the
handling of the variable "inode."

The time to reproduce the hang is highly variable (from hours to days.) We
ran the stress test on a 3.10 kernel with the proposed patch for a week
without failure.

During list_for_each_entry_safe, next_i is becoming free causing
the loop to never terminate. Advance next_i in those cases where
__iget is not done.

Signed-off-by: Jerry Hoemann
Cc: Jeff Kirsher
Cc: Ken Helias
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jerry Hoemann
2014-10-30 07:33:14 +0800

28 Oct, 2014

1 commit

e23738a73 sched, inotify: Deal with nested sleeps ... Browse Code »

inotify_read is a wait loop with sleeps in. Wait loops rely on
task_struct::state and sleeps do too, since that's the only means of
actually sleeping. Therefore the nested sleeps destroy the wait loop
state and the wait loop breaks the sleep functions that assume
TASK_RUNNING (mutex_lock).

Fix this by using the new woken_wake_function and wait_woken() stuff,
which registers wakeups in wait and thereby allows shrinking the
task_state::state changes to the actual sleep part.

Signed-off-by: Peter Zijlstra (Intel)
Cc: Al Viro
Cc: Linus Torvalds
Cc: tglx@linutronix.de
Cc: ilya.dryomov@inktank.com
Cc: umgwanakikbuti@gmail.com
Cc: Robert Love
Cc: Eric Paris
Cc: John McCutchan
Cc: Robert Love
Cc: Oleg Nesterov
Link: http://lkml.kernel.org/r/20140924082242.254858080@infradead.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2014-10-28 17:55:37 +0800

12 Oct, 2014

1 commit

ef4a48c51 Merge tag 'locks-v3.18-1' of git://git.samba.org/jlayton/linux ... Browse Code »

Pull file locking related changes from Jeff Layton:
"This release is a little more busy for file locking changes than the
last:

- a set of patches from Kinglong Mee to fix the lockowner handling in
knfsd
- a pile of cleanups to the internal file lease API. This should get
us a bit closer to allowing for setlease methods that can block.

There are some dependencies between mine and Bruce's trees this cycle,
and I based my tree on top of the requisite patches in Bruce's tree"

* tag 'locks-v3.18-1' of git://git.samba.org/jlayton/linux: (26 commits)
locks: fix fcntl_setlease/getlease return when !CONFIG_FILE_LOCKING
locks: flock_make_lock should return a struct file_lock (or PTR_ERR)
locks: set fl_owner for leases to filp instead of current->files
locks: give lm_break a return value
locks: __break_lease cleanup in preparation of allowing direct removal of leases
locks: remove i_have_this_lease check from __break_lease
locks: move freeing of leases outside of i_lock
locks: move i_lock acquisition into generic_*_lease handlers
locks: define a lm_setup handler for leases
locks: plumb a "priv" pointer into the setlease routines
nfsd: don't keep a pointer to the lease in nfs4_file
locks: clean up vfs_setlease kerneldoc comments
locks: generic_delete_lease doesn't need a file_lock at all
nfsd: fix potential lease memory leak in nfs4_setlease
locks: close potential race in lease_get_mtime
security: make security_file_set_fowner, f_setown and __f_setown void return
locks: consolidate "nolease" routines
locks: remove lock_may_read and lock_may_write
lockd: rip out deferred lock handling from testlock codepath
NFSD: Get reference of lockowner when coping file_lock
...

Linus Torvalds
2014-10-12 01:21:34 +0800

10 Oct, 2014

3 commits

0b37e097a fanotify: enable close-on-exec on events' fd when requested in fanotify_init() ... Browse Code »

According to commit 80af258867648 ("fanotify: groups can specify their
f_flags for new fd"), file descriptors created as part of file access
notification events inherit flags from the event_f_flags argument passed
to syscall fanotify_init(2)[1].

Unfortunately O_CLOEXEC is currently silently ignored.

Indeed, event_f_flags are only given to dentry_open(), which only seems to
care about O_ACCMODE and O_PATH in do_dentry_open(), O_DIRECT in
open_check_o_direct() and O_LARGEFILE in generic_file_open().

It's a pity, since, according to some lookup on various search engines and
http://codesearch.debian.net/, there's already some userspace code which
use O_CLOEXEC:

- in systemd's readahead[2]:

fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);

- in clsync[3]:

#define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC)

int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS);

- in examples [4] from "Filesystem monitoring in the Linux
kernel" article[5] by Aleksander Morgado:

if ((fanotify_fd = fanotify_init (FAN_CLOEXEC,
O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0)

Additionally, since commit 48149e9d3a7e ("fanotify: check file flags
passed in fanotify_init"). having O_CLOEXEC as part of fanotify_init()
second argument is expressly allowed.

So it seems expected to set close-on-exec flag on the file descriptors if
userspace is allowed to request it with O_CLOEXEC.

But Andrew Morton raised[6] the concern that enabling now close-on-exec
might break existing applications which ask for O_CLOEXEC but expect the
file descriptor to be inherited across exec().

In the other hand, as reported by Mihai Dontu[7] close-on-exec on the file
descriptor returned as part of file access notify can break applications
due to deadlock. So close-on-exec is needed for most applications.

More, applications asking for close-on-exec are likely expecting it to be
enabled, relying on O_CLOEXEC being effective. If not, it might weaken
their security, as noted by Jan Kara[8].

So this patch replaces call to macro get_unused_fd() by a call to function
get_unused_fd_flags() with event_f_flags value as argument. This way
O_CLOEXEC flag in the second argument of fanotify_init(2) syscall is
interpreted and close-on-exec get enabled when requested.

[1] http://man7.org/linux/man-pages/man2/fanotify_init.2.html
[2] http://cgit.freedesktop.org/systemd/systemd/tree/src/readahead/readahead-collect.c?id=v208#n294
[3] https://github.com/xaionaro/clsync/blob/v0.2.1/sync.c#L1631
https://github.com/xaionaro/clsync/blob/v0.2.1/configuration.h#L38
[4] http://www.lanedo.com/~aleksander/fanotify/fanotify-example.c
[5] http://www.lanedo.com/2013/filesystem-monitoring-linux-kernel/
[6] http://lkml.kernel.org/r/20141001153621.65e9258e65a6167bf2e4cb50@linux-foundation.org
[7] http://lkml.kernel.org/r/20141002095046.3715eb69@mdontu-l
[8] http://lkml.kernel.org/r/20141002104410.GB19748@quack.suse.cz

Link: http://lkml.kernel.org/r/cover.1411562410.git.ydroneaud@opteya.com
Signed-off-by: Yann Droneaud
Reviewed-by: Jan Kara
Reviewed by: Heinrich Schuchardt
Tested-by: Heinrich Schuchardt
Cc: Mihai Don\u021bu
Cc: Pádraig Brady
Cc: Heinrich Schuchardt
Cc: Jan Kara
Cc: Valdis Kletnieks
Cc: Michael Kerrisk-manpages
Cc: Lino Sanfilippo
Cc: Richard Guy Briggs
Cc: Eric Paris
Cc: Al Viro
Cc: Michael Kerrisk
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yann Droneaud
2014-10-10 10:25:46 +0800
105d1b425 fsnotify: don't put user context if it was never assigned ... Browse Code »

On some failure paths we may attempt to free user context even if it
wasn't assigned yet. This will cause a NULL ptr deref and a kernel BUG.

The path I was looking at is in inotify_new_group():

oevent = kmalloc(sizeof(struct inotify_event_info), GFP_KERNEL);
if (unlikely(!oevent)) {
fsnotify_destroy_group(group);
return ERR_PTR(-ENOMEM);
}

fsnotify_destroy_group() would get called here, but
group->inotify_data.user is only getting assigned later:

group->inotify_data.user = get_current_user();

Signed-off-by: Sasha Levin
Cc: John McCutchan
Cc: Robert Love
Cc: Eric Paris
Reviewed-by: Heinrich Schuchardt
Reviewed-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sasha Levin
2014-10-10 10:25:45 +0800
cafbaae8a fs/notify/group.c: make fsnotify_final_destroy_group() static ... Browse Code »

No callers outside this file.

Cc: Sasha Levin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2014-10-10 10:25:45 +0800

11 Sep, 2014

2 commits

7e8824816 fs/notify: don't show f_handle if exportfs_encode_inode_fh failed ... Browse Code »

Currently we handle only ENOSPC. In case of other errors the file_handle
variable isn't filled properly and we will show a part of stack.

Signed-off-by: Andrey Vagin
Acked-by: Cyrill Gorcunov
Cc: Alexander Viro
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrey Vagin
2014-09-11 06:42:12 +0800
1fc98d11c fsnotify/fdinfo: use named constants instead of hardcoded values ... Browse Code »

MAX_HANDLE_SZ is equal to 128, but currently the size of pad is only 64
bytes, so exportfs_encode_inode_fh can return an error.

Signed-off-by: Andrey Vagin
Acked-by: Cyrill Gorcunov
Cc: Alexander Viro
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrey Vagin
2014-09-11 06:42:12 +0800

10 Sep, 2014

1 commit

e0b93eddf security: make security_file_set_fowner, f_setown and __f_setown void return ... Browse Code »

security_file_set_fowner always returns 0, so make it f_setown and
__f_setown void return functions and fix up the error handling in the
callers.

Cc: linux-security-module@vger.kernel.org
Signed-off-by: Jeff Layton
Reviewed-by: Christoph Hellwig

Jeff Layton
2014-09-10 04:01:36 +0800

07 Aug, 2014

3 commits

1d023284c list: fix order of arguments for hlist_add_after(_rcu) ... Browse Code »

All other add functions for lists have the new item as first argument
and the position where it is added as second argument. This was changed
for no good reason in this function and makes using it unnecessary
confusing.

The name was changed to hlist_add_behind() to cause unconverted code to
generate a compile error instead of using the wrong parameter order.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Ken Helias
Cc: "Paul E. McKenney"
Acked-by: Jeff Kirsher [intel driver bits]
Cc: Hugh Dickins
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ken Helias
2014-08-07 09:01:24 +0800
5838d4442 fanotify: fix double free of pending permission events ... Browse Code »

Commit 85816794240b ("fanotify: Fix use after free for permission
events") introduced a double free issue for permission events which are
pending in group's notification queue while group is being destroyed.
These events are freed from fanotify_handle_event() but they are not
removed from groups notification queue and thus they get freed again
from fsnotify_flush_notify().

Fix the problem by removing permission events from notification queue
before freeing them if we skip processing access response. Also expand
comments in fanotify_release() to explain group shutdown in detail.

Fixes: 85816794240b9659e66e4d9b0df7c6e814e5f603
Signed-off-by: Jan Kara
Reported-by: Douglas Leeder
Tested-by: Douglas Leeder
Reported-by: Heinrich Schuchard
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2014-08-07 09:01:12 +0800
8ba8fa917 fsnotify: rename event handling functions ... Browse Code »

Rename fsnotify_add_notify_event() to fsnotify_add_event() since the
"notify" part is duplicit. Rename fsnotify_remove_notify_event() and
fsnotify_peek_notify_event() to fsnotify_remove_first_event() and
fsnotify_peek_first_event() respectively since "notify" part is duplicit
and they really look at the first event in the queue.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jan Kara
Cc: Eric Paris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2014-08-07 09:01:12 +0800

07 Jun, 2014

1 commit

92f778dd5 inotify: convert use of typedef ctl_table to struct ctl_table ... Browse Code »

This typedef is unnecessary and should just be removed.

Signed-off-by: Joe Perches
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2014-06-07 07:08:16 +0800

05 Jun, 2014

4 commits

48149e9d3 fanotify: check file flags passed in fanotify_init ... Browse Code »

Without this patch fanotify_init does not validate the value passed in
event_f_flags.

When a fanotify event is read from the fanotify file descriptor a new
file descriptor is created where file.f_flags = event_f_flags.

Internal and external open flags are stored together in field f_flags of
struct file. Hence, an application might create file descriptors with
internal flags like FMODE_EXEC, FMODE_NOCMTIME set.

Jan Kara and Eric Paris both aggreed that this is a bug and the value of
event_f_flags should be checked:
https://lkml.org/lkml/2014/4/29/522
https://lkml.org/lkml/2014/4/29/539

This updated patch version considers the comments by Michael Kerrisk in
https://lkml.org/lkml/2014/5/4/10

With the patch the value of event_f_flags is checked.
When specifying an invalid value error EINVAL is returned.

Internal flags are disallowed.

File creation flags are disallowed:
O_CREAT, O_DIRECTORY, O_EXCL, O_NOCTTY, O_NOFOLLOW, O_TRUNC, and O_TTY_INIT.

Flags which do not make sense with fanotify are disallowed:
__O_TMPFILE, O_PATH, FASYNC, and O_DIRECT.

This leaves us with the following allowed values:

O_RDONLY, O_WRONLY, O_RDWR are basic functionality. The are stored in the
bits given by O_ACCMODE.

O_APPEND is working as expected. The value might be useful in a logging
application which appends the current status each time the log is opened.

O_LARGEFILE is needed for files exceeding 4GB on 32bit systems.

O_NONBLOCK may be useful when monitoring slow devices like tapes.

O_NDELAY is equal to O_NONBLOCK except for platform parisc.
To avoid code breaking on parisc either both flags should be
allowed or none. The patch allows both.

__O_SYNC and O_DSYNC may be used to avoid data loss on power disruption.

O_NOATIME may be useful to reduce disk activity.

O_CLOEXEC may be useful, if separate processes shall be used to scan files.

Once this patch is accepted, the fanotify_init.2 manpage has to be updated.

Signed-off-by: Heinrich Schuchardt
Reviewed-by: Jan Kara
Cc: Michael Kerrisk
Cc: Valdis Kletnieks
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Heinrich Schuchardt
2014-06-05 07:53:52 +0800
cc299a98e fs/notify/fanotify/fanotify_user.c: fix FAN_MARK_FLUSH flag checking ... Browse Code »

If fanotify_mark is called with illegal value of arguments flags and
marks it usually returns EINVAL.

When fanotify_mark is called with FAN_MARK_FLUSH the argument flags is
not checked for irrelevant flags like FAN_MARK_IGNORED_MASK.

The patch removes this inconsistency.

If an irrelevant flag is set error EINVAL is returned.

Signed-off-by: Heinrich Schuchardt
Acked-by: Michael Kerrisk
Acked-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Heinrich Schuchardt
2014-06-05 07:53:52 +0800
efa8f7e5d fs/notify/mark.c: trivial cleanup ... Browse Code »

Do not initialize private_destroy_list twice. list_replace_init()
already takes care of initializing private_destroy_list. We don't need
to initialize it with LIST_HEAD() beforehand.

Signed-off-by: David Cohen
Cc: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Cohen
2014-06-05 07:53:52 +0800
0a8dd2db5 fanotify: FAN_MARK_FLUSH: avoid having to provide a fake/invalid fd and path ... Browse Code »

Originally from Tvrtko Ursulin (https://lkml.org/lkml/2011/1/12/112)

Avoid having to provide a fake/invalid fd and path when flushing marks

Currently for a group to flush marks it has set it needs to provide a
fake or invalid (but resolvable) file descriptor and path when calling
fanotify_mark. This patch pulls the flush handling a bit up so file
descriptor and path are completely ignored when flushing.

I reworked the patch to be applicable again (the signature of
fanotify_mark has changed since Tvrtko's work).

Signed-off-by: Heinrich Schuchardt
Cc: Tvrtko Ursulin
Reviewed-by: Jan Kara
Acked-by: Eric Paris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Heinrich Schuchardt
2014-06-05 07:53:52 +0800

07 May, 2014

1 commit

1e2ee49f7 fanotify: fix -EOVERFLOW with large files on 64-bit ... Browse Code »

On 64-bit systems, O_LARGEFILE is automatically added to flags inside
the open() syscall (also openat(), blkdev_open(), etc). Userspace
therefore defines O_LARGEFILE to be 0 - you can use it, but it's a
no-op. Everything should be O_LARGEFILE by default.

But: when fanotify does create_fd() it uses dentry_open(), which skips
all that. And userspace can't set O_LARGEFILE in fanotify_init()
because it's defined to 0. So if fanotify gets an event regarding a
large file, the read() will just fail with -EOVERFLOW.

This patch adds O_LARGEFILE to fanotify_init()'s event_f_flags on 64-bit
systems, using the same test as open()/openat()/etc.

Addresses https://bugzilla.redhat.com/show_bug.cgi?id=696821

Signed-off-by: Will Woods
Acked-by: Eric Paris
Reviewed-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Will Woods
2014-05-07 04:04:59 +0800

04 Apr, 2014

1 commit

d507816b5 fanotify: move unrelated handling from copy_event_to_user() ... Browse Code »

Move code moving event structure to access_list from copy_event_to_user()
to fanotify_read() where it is more logical (so that we can immediately
see in the main loop that we either move the event to a different list
or free it). Also move special error handling for permission events
from copy_event_to_user() to the main loop to have it in one place with
error handling for normal events. This makes copy_event_to_user()
really only copy the event to user without any side effects.

Signed-off-by: Jan Kara
Cc: Eric Paris
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2014-04-04 07:20:51 +0800