Eric Lee / smarc-fsl-linux-kernel

16 Jul, 2017

1 commit

78dcf7342 Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull ->s_options removal from Al Viro:
"Preparations for fsmount/fsopen stuff (coming next cycle). Everything
gets moved to explicit ->show_options(), killing ->s_options off +
some cosmetic bits around fs/namespace.c and friends. Basically, the
stuff needed to work with fsmount series with minimum of conflicts
with other work.

It's not strictly required for this merge window, but it would reduce
the PITA during the coming cycle, so it would be nice to have those
bits and pieces out of the way"

* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
isofs: Fix isofs_show_options()
VFS: Kill off s_options and helpers
orangefs: Implement show_options
9p: Implement show_options
isofs: Implement show_options
afs: Implement show_options
affs: Implement show_options
befs: Implement show_options
spufs: Implement show_options
bpf: Implement show_options
ramfs: Implement show_options
pstore: Implement show_options
omfs: Implement show_options
hugetlbfs: Implement show_options
VFS: Don't use save/replace_mount_options if not using generic_show_options
VFS: Provide empty name qstr
VFS: Make get_filesystem() return the affected filesystem
VFS: Clean up whitespace in fs/namespace.c and fs/super.c
Provide a function to create a NUL-terminated string from unterminated data

Linus Torvalds
2017-07-16 03:00:42 +0800

11 Jul, 2017

1 commit

b17c070fb fs/dcache.c: fix spin lockup issue on nlru->lock ... Browse Code »

__list_lru_walk_one() acquires nlru spin lock (nlru->lock) for longer
duration if there are more number of items in the lru list. As per the
current code, it can hold the spin lock for upto maximum UINT_MAX
entries at a time. So if there are more number of items in the lru
list, then "BUG: spinlock lockup suspected" is observed in the below
path:

spin_bug+0x90
do_raw_spin_lock+0xfc
_raw_spin_lock+0x28
list_lru_add+0x28
dput+0x1c8
path_put+0x20
terminate_walk+0x3c
path_lookupat+0x100
filename_lookup+0x6c
user_path_at_empty+0x54
SyS_faccessat+0xd0
el0_svc_naked+0x24

This nlru->lock is acquired by another CPU in this path -

d_lru_shrink_move+0x34
dentry_lru_isolate_shrink+0x48
__list_lru_walk_one.isra.10+0x94
list_lru_walk_node+0x40
shrink_dcache_sb+0x60
do_remount_sb+0xbc
do_emergency_remount+0xb0
process_one_work+0x228
worker_thread+0x2e0
kthread+0xf4
ret_from_fork+0x10

Fix this lockup by reducing the number of entries to be shrinked from
the lru list to 1024 at once. Also, add cond_resched() before
processing the lru list again.

Link: http://marc.info/?t=149722864900001&r=1&w=2
Link: http://lkml.kernel.org/r/1498707575-2472-1-git-send-email-stummala@codeaurora.org
Signed-off-by: Sahitya Tummala
Suggested-by: Jan Kara
Suggested-by: Vladimir Davydov
Acked-by: Vladimir Davydov
Cc: Alexander Polakov
Cc: Al Viro
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sahitya Tummala
2017-07-11 07:32:33 +0800

09 Jul, 2017

1 commit

b8d4c1f9f Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull misc filesystem updates from Al Viro:
"Assorted normal VFS / filesystems stuff..."

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
dentry name snapshots
Make statfs properly return read-only state after emergency remount
fs/dcache: init in_lookup_hashtable
minix: Deinline get_block, save 2691 bytes
fs: Reorder inode_owner_or_capable() to avoid needless
fs: warn in case userspace lied about modprobe return

Linus Torvalds
2017-07-09 01:50:54 +0800

08 Jul, 2017

1 commit

49d31c2f3 dentry name snapshots ... Browse Code »

take_dentry_name_snapshot() takes a safe snapshot of dentry name;
if the name is a short one, it gets copied into caller-supplied
structure, otherwise an extra reference to external name is grabbed
(those are never modified). In either case the pointer to stable
string is stored into the same structure.

dentry must be held by the caller of take_dentry_name_snapshot(),
but may be freely dropped afterwards - the snapshot will stay
until destroyed by release_dentry_name_snapshot().

Intended use:
struct name_snapshot s;

take_dentry_name_snapshot(&s, dentry);
...
access s.name
...
release_dentry_name_snapshot(&s);

Replaces fsnotify_oldname_...(), gets used in fsnotify to obtain the name
to pass down with event.

Signed-off-by: Al Viro

Al Viro
2017-07-08 08:09:10 +0800

07 Jul, 2017

1 commit

3d375d785 mm: update callers to use HASH_ZERO flag ... Browse Code »

Update dcache, inode, pid, mountpoint, and mount hash tables to use
HASH_ZERO, and remove initialization after allocations. In case of
places where HASH_EARLY was used such as in __pv_init_lock_hash the
zeroed hash table was already assumed, because memblock zeroes the
memory.

CPU: SPARC M6, Memory: 7T
Before fix:
Dentry cache hash table entries: 1073741824
Inode-cache hash table entries: 536870912
Mount-cache hash table entries: 16777216
Mountpoint-cache hash table entries: 16777216
ftrace: allocating 20414 entries in 40 pages
Total time: 11.798s

After fix:
Dentry cache hash table entries: 1073741824
Inode-cache hash table entries: 536870912
Mount-cache hash table entries: 16777216
Mountpoint-cache hash table entries: 16777216
ftrace: allocating 20414 entries in 40 pages
Total time: 3.198s

CPU: Intel Xeon E5-2630, Memory: 2.2T:
Before fix:
Dentry cache hash table entries: 536870912
Inode-cache hash table entries: 268435456
Mount-cache hash table entries: 8388608
Mountpoint-cache hash table entries: 8388608
CPU: Physical Processor ID: 0
Total time: 3.245s

After fix:
Dentry cache hash table entries: 536870912
Inode-cache hash table entries: 268435456
Mount-cache hash table entries: 8388608
Mountpoint-cache hash table entries: 8388608
CPU: Physical Processor ID: 0
Total time: 3.244s

Link: http://lkml.kernel.org/r/1488432825-92126-4-git-send-email-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin
Reviewed-by: Babu Moger
Cc: David Miller
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Tatashin
2017-07-07 07:24:33 +0800

06 Jul, 2017

1 commit

cdf01226b VFS: Provide empty name qstr ... Browse Code »

Provide an empty name (ie. "") qstr for general use.

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2017-07-06 15:27:09 +0800

30 Jun, 2017

1 commit

6916363f3 fs/dcache: init in_lookup_hashtable ... Browse Code »

in_lookup_hashtable was introduced in commit 94bdd655caba ("parallel
lookups machinery, part 3") and never initialized but since it is in
the data it is all zeros. But we need this for -RT.

Cc: Alexander Viro
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior
Signed-off-by: Al Viro

Sebastian Andrzej Siewior
2017-06-30 08:17:14 +0800

15 Jun, 2017

1 commit

81be24d26 Hang/soft lockup in d_invalidate with simultaneous calls ... Browse Code »

It's not hard to trigger a bunch of d_invalidate() on the same
dentry in parallel. They end up fighting each other - any
dentry picked for removal by one will be skipped by the rest
and we'll go for the next iteration through the entire
subtree, even if everything is being skipped. Morevoer, we
immediately go back to scanning the subtree. The only thing
we really need is to dissolve all mounts in the subtree and
as soon as we've nothing left to do, we can just unhash the
dentry and bugger off.

Signed-off-by: Al Viro

Al Viro
2017-06-15 18:52:09 +0800

03 May, 2017

1 commit

563f40019 fs: don't set *REFERENCED on single use objects ... Browse Code »

By default we set DCACHE_REFERENCED and I_REFERENCED on any dentry or
inode we create. This is problematic as this means that it takes two
trips through the LRU for any of these objects to be reclaimed,
regardless of their actual lifetime. With enough pressure from these
caches we can easily evict our working set from page cache with single
use objects. So instead only set *REFERENCED if we've already been
added to the LRU list. This means that we've been touched since the
first time we were accessed, and so more likely to need to hang out in
cache.

To illustrate this issue I wrote the following scripts

https://github.com/josefbacik/debug-scripts/tree/master/cache-pressure

on my test box. It is a single socket 4 core CPU with 16gib of RAM and
I tested on an Intel 2tib NVME drive. The cache-pressure.sh script
creates a new file system and creates 2 6.5gib files in order to take up
13gib of the 16gib of ram with pagecache. Then it runs a test program
that reads these 2 files in a loop, and keeps track of how often it has
to read bytes for each loop. On an ideal system with no pressure we
should have to read 0 bytes indefinitely. The second thing this script
does is start a fs_mark job that creates a ton of 0 length files,
putting pressure on the system with slab only allocations. On exit the
script prints out how many bytes were read by the read-file program.
The results are as follows

Without patch:
/mnt/btrfs-test/reads/file1: total read during loops 27262988288
/mnt/btrfs-test/reads/file2: total read during loops 27262976000

With patch:
/mnt/btrfs-test/reads/file2: total read during loops 18640457728
/mnt/btrfs-test/reads/file1: total read during loops 9565376512

This patch results in a 50% reduction of the amount of pages evicted
from our working set.

Signed-off-by: Josef Bacik
Signed-off-by: Al Viro

Josef Bacik
2017-05-03 23:47:05 +0800

10 Jan, 2017

1 commit

3895dbf89 mnt: Protect the mountpoint hashtable with mount_lock ... Browse Code »

Protecting the mountpoint hashtable with namespace_sem was sufficient
until a call to umount_mnt was added to mntput_no_expire. At which
point it became possible for multiple calls of put_mountpoint on
the same hash chain to happen on the same time.

Kristen Johansen reported:
> This can cause a panic when simultaneous callers of put_mountpoint
> attempt to free the same mountpoint. This occurs because some callers
> hold the mount_hash_lock, while others hold the namespace lock. Some
> even hold both.
>
> In this submitter's case, the panic manifested itself as a GP fault in
> put_mountpoint() when it called hlist_del() and attempted to dereference
> a m_hash.pprev that had been poisioned by another thread.

Al Viro observed that the simple fix is to switch from using the namespace_sem
to the mount_lock to protect the mountpoint hash table.

I have taken Al's suggested patch moved put_mountpoint in pivot_root
(instead of taking mount_lock an additional time), and have replaced
new_mountpoint with get_mountpoint a function that does the hash table
lookup and addition under the mount_lock. The introduction of get_mounptoint
ensures that only the mount_lock is needed to manipulate the mountpoint
hashtable.

d_set_mounted is modified to only set DCACHE_MOUNTED if it is not
already set. This allows get_mountpoint to use the setting of
DCACHE_MOUNTED to ensure adding a struct mountpoint for a dentry
happens exactly once.

Cc: stable@vger.kernel.org
Fixes: ce07d891a089 ("mnt: Honor MNT_LOCKED when detaching mounts")
Reported-by: Krister Johansen
Suggested-by: Al Viro
Acked-by: Al Viro
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2017-01-10 08:34:43 +0800

25 Dec, 2016

1 commit

7c0f6ba68 Replace <asm/uaccess.h> with <linux/uaccess.h> globally ... Browse Code »

This was entirely automated, using the script by Al:

PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*'
sed -i -e "s!$PATT!#include !" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2016-12-25 03:46:01 +0800

04 Dec, 2016

2 commits

f74e7b33c vfs: remove unused have_submounts() function ... Browse Code »

Now that path_has_submounts() has been added have_submounts() is no
longer used so remove it.

Link: http://lkml.kernel.org/r/20161011053428.27645.12310.stgit@pluto.themaw.net
Signed-off-by: Ian Kent
Cc: Al Viro
Cc: Eric W. Biederman
Cc: Omar Sandoval
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Ian Kent
2016-12-04 09:51:49 +0800
01619491a vfs: add path_has_submounts() ... Browse Code »

d_mountpoint() can only be used reliably to establish if a dentry is
not mounted in any namespace. It isn't aware of the possibility there
may be multiple mounts using the given dentry, possibly in a different
namespace.

Add function, path_has_submounts(), that checks is a struct path contains
mounts (or is a mountpoint itself) to handle this case.

Link: http://lkml.kernel.org/r/20161011053403.27645.55242.stgit@pluto.themaw.net
Signed-off-by: Ian Kent
Cc: Al Viro
Cc: Eric W. Biederman
Cc: Omar Sandoval
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Ian Kent
2016-12-04 09:51:47 +0800

07 Aug, 2016

1 commit

fe64f3283 Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull more vfs updates from Al Viro:
"Assorted cleanups and fixes.

In the "trivial API change" department - ->d_compare() losing 'parent'
argument"

* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
cachefiles: Fix race between inactivating and culling a cache object
9p: use clone_fid()
9p: fix braino introduced in "9p: new helper - v9fs_parent_fid()"
vfs: make dentry_needs_remove_privs() internal
vfs: remove file_needs_remove_privs()
vfs: fix deadlock in file_remove_privs() on overlayfs
get rid of 'parent' argument of ->d_compare()
cifs, msdos, vfat, hfs+: don't bother with parent in ->d_compare()
affs ->d_compare(): don't bother with ->d_inode
fold _d_rehash() and __d_rehash() together
fold dentry_rcuwalk_invalidate() into its only remaining caller

Linus Torvalds
2016-08-07 22:01:14 +0800

06 Aug, 2016

1 commit

835c92d43 Merge branch 'work.const-qstr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull qstr constification updates from Al Viro:
"Fairly self-contained bunch - surprising lot of places passes struct
qstr * as an argument when const struct qstr * would suffice; it
complicates analysis for no good reason.

I'd prefer to feed that separately from the assorted fixes (those are
in #for-linus and with somewhat trickier topology)"

* 'work.const-qstr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
qstr: constify instances in adfs
qstr: constify instances in lustre
qstr: constify instances in f2fs
qstr: constify instances in ext2
qstr: constify instances in vfat
qstr: constify instances in procfs
qstr: constify instances in fuse
qstr constify instances in fs/dcache.c
qstr: constify instances in nfs
qstr: constify instances in ocfs2
qstr: constify instances in autofs4
qstr: constify instances in hfs
qstr: constify instances in hfsplus
qstr: constify instances in logfs
qstr: constify dentry_init_security

Linus Torvalds
2016-08-06 21:49:02 +0800

01 Aug, 2016

1 commit

6fa67e707 get rid of 'parent' argument of ->d_compare() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2016-08-01 04:37:25 +0800

30 Jul, 2016

2 commits

15d3c589f fold _d_rehash() and __d_rehash() together ... Browse Code »

The only place where we feed to __d_rehash() something other than
d_hash(dentry->d_name.hash) is __d_move(), where we give it d_hash
of another dentry. Postpone rehashing until we'd switched the
names and we are rid of that exception, along with the need to
keep _d_rehash() and __d_rehash() separate.

Signed-off-by: Al Viro

Al Viro
2016-07-30 05:45:21 +0800
d614146d1 fold dentry_rcuwalk_invalidate() into its only remaining caller ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2016-07-30 05:28:58 +0800

29 Jul, 2016

2 commits

6784725ab Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs updates from Al Viro:
"Assorted cleanups and fixes.

Probably the most interesting part long-term is ->d_init() - that will
have a bunch of followups in (at least) ceph and lustre, but we'll
need to sort the barrier-related rules before it can get used for
really non-trivial stuff.

Another fun thing is the merge of ->d_iput() callers (dentry_iput()
and dentry_unlink_inode()) and a bunch of ->d_compare() ones (all
except the one in __d_lookup_lru())"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
fs/dcache.c: avoid soft-lockup in dput()
vfs: new d_init method
vfs: Update lookup_dcache() comment
bdev: get rid of ->bd_inodes
Remove last traces of ->sync_page
new helper: d_same_name()
dentry_cmp(): use lockless_dereference() instead of smp_read_barrier_depends()
vfs: clean up documentation
vfs: document ->d_real()
vfs: merge .d_select_inode() into .d_real()
unify dentry_iput() and dentry_unlink_inode()
binfmt_misc: ->s_root is not going anywhere
drop redundant ->owner initializations
ufs: get rid of redundant checks
orangefs: constify inode_operations
missed comment updates from ->direct_IO() prototype change
file_inode(f)->i_mapping is f->f_mapping
trim fsnotify hooks a bit
9p: new helper - v9fs_parent_fid()
debugfs: ->d_parent is never NULL or negative
...

Linus Torvalds
2016-07-29 03:59:05 +0800
554828ee0 Merge branch 'salted-string-hash' ... Browse Code »

This changes the vfs dentry hashing to mix in the parent pointer at the
_beginning_ of the hash, rather than at the end.

That actually improves both the hash and the code generation, because we
can move more of the computation to the "static" part of the dcache
setup, and do less at lookup runtime.

It turns out that a lot of other hash users also really wanted to mix in
a base pointer as a 'salt' for the hash, and so the slightly extended
interface ends up working well for other cases too.

Users that want a string hash that is purely about the string pass in a
'salt' pointer of NULL.

* merge branch 'salted-string-hash':
fs/dcache.c: Save one 32-bit multiply in dcache lookup
vfs: make the string hashes salt the hash

Linus Torvalds
2016-07-29 03:26:31 +0800

25 Jul, 2016

3 commits

47be61845 fs/dcache.c: avoid soft-lockup in dput() ... Browse Code »

We triggered soft-lockup under stress test which
open/access/write/close one file concurrently on more than
five different CPUs:

WARN: soft lockup - CPU#0 stuck for 11s! [who:30631]
...
[] dput+0x100/0x298
[] terminate_walk+0x4c/0x60
[] path_lookupat+0x5cc/0x7a8
[] filename_lookup+0x38/0xf0
[] user_path_at_empty+0x78/0xd0
[] user_path_at+0x1c/0x28
[] SyS_faccessat+0xb4/0x230

->d_lock trylock may failed many times because of concurrently
operations, and dput() may execute a long time.

Fix this by replacing cpu_relax() with cond_resched().
dput() used to be sleepable, so make it sleepable again
should be safe.

Cc:
Signed-off-by: Wei Fang
Signed-off-by: Al Viro

Wei Fang
2016-07-25 04:37:16 +0800
285b102d3 vfs: new d_init method ... Browse Code »

Allow filesystem to initialize dentry at allocation time.

Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro

Miklos Szeredi
2016-07-25 04:36:29 +0800
17648b871 Merge branch 'test.d_iput' into work.misc Browse Code »

Al Viro
2016-07-25 04:36:04 +0800

21 Jul, 2016

1 commit

9aba36dea qstr constify instances in fs/dcache.c ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2016-07-21 11:30:06 +0800

01 Jul, 2016

4 commits

b223f4e21 Merge branch 'd_real' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs into work.misc Browse Code »

Al Viro
2016-07-01 11:34:49 +0800
d4c91a8f7 new helper: d_same_name() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2016-07-01 11:30:44 +0800
ae0a843c7 dentry_cmp(): use lockless_dereference() instead of smp_read_barrier_depends() ... Browse Code »

lockless_dereference() was added which can be used in place of
hard-coding smp_read_barrier_depends().

Signed-off-by: He Kuang
Signed-off-by: Al Viro

He Kuang
2016-07-01 11:30:35 +0800
c074cefcc Merge branch 'for-linus' into work.misc Browse Code »

Al Viro
2016-07-01 11:30:06 +0800

30 Jun, 2016

1 commit

2d902671c vfs: merge .d_select_inode() into .d_real() ... Browse Code »

The two methods essentially do the same: find the real dentry/inode
belonging to an overlay dentry. The difference is in the usage:

vfs_open() uses ->d_select_inode() and expects the function to perform
copy-up if necessary based on the open flags argument.

file_dentry() uses ->d_real() passing in the overlay dentry as well as the
underlying inode.

vfs_rename() uses ->d_select_inode() but passes zero flags. ->d_real()
with a zero inode would have worked just as well here.

This patch merges the functionality of ->d_select_inode() into ->d_real()
by adding an 'open_flags' argument to the latter.

[Al Viro] Make the signature of d_real() match that of ->d_real() again.
And constify the inode argument, while we are at it.

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2016-06-30 14:53:27 +0800

20 Jun, 2016

1 commit

e7d6ef979 fix idiotic braino in d_alloc_parallel() ... Browse Code »

Check for d_unhashed() while searching in in-lookup hash was absolutely
wrong. Worse, it masked a deadlock on dget() done under bitlock that
nests inside ->d_lock. Thanks to J. R. Okajima for spotting it.

Spotted-by: "J. R. Okajima"
Wearing-brown-paperbag: Al Viro
Signed-off-by: Al Viro

Al Viro
2016-06-20 22:07:42 +0800

12 Jun, 2016

1 commit

703b5faf2 fs/dcache.c: Save one 32-bit multiply in dcache lookup ... Browse Code »

Noe that we're mixing in the parent pointer earlier, we
don't need to use hash_32() to mix its bits. Instead, we can
just take the msbits of the hash value directly.

For those applications which use the partial_name_hash(),
move the multiply to end_name_hash.

Signed-off-by: George Spelvin
Signed-off-by: Linus Torvalds

George Spelvin
2016-06-12 05:57:56 +0800

11 Jun, 2016

1 commit

8387ff257 vfs: make the string hashes salt the hash ... Browse Code »

We always mixed in the parent pointer into the dentry name hash, but we
did it late at lookup time. It turns out that we can simplify that
lookup-time action by salting the hash with the parent pointer early
instead of late.

A few other users of our string hashes also wanted to mix in their own
pointers into the hash, and those are updated to use the same mechanism.

Hash users that don't have any particular initial salt can just use the
NULL pointer as a no-salt.

Cc: Vegard Nossum
Cc: George Spelvin
Cc: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2016-06-11 11:21:46 +0800

10 Jun, 2016

1 commit

ba65dc5ef much milder d_walk() race ... Browse Code »

d_walk() relies upon the tree not getting rearranged under it without
rename_lock being touched. And we do grab rename_lock around the
places that change the tree topology. Unfortunately, branch reordering
is just as bad from d_walk() POV and we have two places that do it
without touching rename_lock - one in handling of cursors (for ramfs-style
directories) and another in autofs. autofs one is a separate story; this
commit deals with the cursors.
* mark cursor dentries explicitly at allocation time
* make __dentry_kill() leave ->d_child.next pointing to the next
non-cursor sibling, making sure that it won't be moved around unnoticed
before the parent is relocked on ascend-to-parent path in d_walk().
* make d_walk() skip cursors explicitly; strictly speaking it's
not necessary (all callbacks we pass to d_walk() are no-ops on cursors),
but it makes analysis easier.

Signed-off-by: Al Viro

Al Viro
2016-06-10 23:32:47 +0800

08 Jun, 2016

1 commit

3d56c25e3 fix d_walk()/non-delayed __d_free() race ... Browse Code »

Ascend-to-parent logics in d_walk() depends on all encountered child
dentries not getting freed without an RCU delay. Unfortunately, in
quite a few cases it is not true, with hard-to-hit oopsable race as
the result.

Fortunately, the fix is simiple; right now the rule is "if it ever
been hashed, freeing must be delayed" and changing it to "if it
ever had a parent, freeing must be delayed" closes that hole and
covers all cases the old rule used to cover. Moreover, pipes and
sockets remain _not_ covered, so we do not introduce RCU delay in
the cases which are the reason for having that delay conditional
in the first place.

Cc: stable@vger.kernel.org # v3.2+ (and watch out for __d_materialise_dentry())
Signed-off-by: Al Viro

Al Viro
2016-06-08 09:26:55 +0800

30 May, 2016

2 commits

550dce01d unify dentry_iput() and dentry_unlink_inode() ... Browse Code »

There is a lot of duplication between dentry_unlink_inode() and dentry_iput().
The only real difference is that dentry_unlink_inode() bumps ->d_seq and
dentry_iput() doesn't. The argument of the latter is known to have been
unhashed, so anybody who might've found it in RCU lookup would already be
doomed to a ->d_seq mismatch. And we want to avoid pointless smp_rmb() there.

This patch makes dentry_unlink_inode() bump ->d_seq only for hashed dentries.
It's safe (d_delete() calls that sucker only if we are holding the only
reference to dentry, so rehash is not going to happen) and it allows
to use dentry_unlink_inode() in __dentry_kill() and get rid of dentry_iput().

The interesting question here is profiling; it *is* a hot path, and extra
conditional jumps in there might or might not be painful.

Signed-off-by: Al Viro

Al Viro
2016-05-30 08:28:22 +0800
affda4841 trim fsnotify hooks a bit ... Browse Code »

fsnotify_d_move()/__fsnotify_d_instantiate()/__fsnotify_update_dcache_flags()
are identical to each other, regardless of the config.

Signed-off-by: Al Viro

Al Viro
2016-05-30 06:35:12 +0800

29 May, 2016

2 commits

7e0fb73c5 Merge branch 'hash' of git://ftp.sciencehorizons.net/linux ... Browse Code »

Pull string hash improvements from George Spelvin:
"This series does several related things:

- Makes the dcache hash (fs/namei.c) useful for general kernel use.

(Thanks to Bruce for noticing the zero-length corner case)

- Converts the string hashes in to use the
above.

- Avoids 64-bit multiplies in hash_64() on 32-bit platforms. Two
32-bit multiplies will do well enough.

- Rids the world of the bad hash multipliers in hash_32.

This finishes the job started in commit 689de1d6ca95 ("Minimal
fix-up of bad hashing behavior of hash_64()")

The vast majority of Linux architectures have hardware support for
32x32-bit multiply and so derive no benefit from "simplified"
multipliers.

The few processors that do not (68000, h8/300 and some models of
Microblaze) have arch-specific implementations added. Those
patches are last in the series.

- Overhauls the dcache hash mixing.

The patch in commit 0fed3ac866ea ("namei: Improve hash mixing if
CONFIG_DCACHE_WORD_ACCESS") was an off-the-cuff suggestion.
Replaced with a much more careful design that's simultaneously
faster and better. (My own invention, as there was noting suitable
in the literature I could find. Comments welcome!)

- Modify the hash_name() loop to skip the initial HASH_MIX(). This
would let us salt the hash if we ever wanted to.

- Sort out partial_name_hash().

The hash function is declared as using a long state, even though
it's truncated to 32 bits at the end and the extra internal state
contributes nothing to the result. And some callers do odd things:

- fs/hfs/string.c only allocates 32 bits of state
- fs/hfsplus/unicode.c uses it to hash 16-bit unicode symbols not bytes

- Modify bytemask_from_count to handle inputs of 1..sizeof(long)
rather than 0..sizeof(long)-1. This would simplify users other
than full_name_hash"

Special thanks to Bruce Fields for testing and finding bugs in v1. (I
learned some humbling lessons about "obviously correct" code.)

On the arch-specific front, the m68k assembly has been tested in a
standalone test harness, I've been in contact with the Microblaze
maintainers who mostly don't care, as the hardware multiplier is never
omitted in real-world applications, and I haven't heard anything from
the H8/300 world"

* 'hash' of git://ftp.sciencehorizons.net/linux:
h8300: Add
microblaze: Add
m68k: Add
: Add support for architecture-specific functions
fs/namei.c: Improve dcache hash function
Eliminate bad hash multipliers from hash_32() and hash_64()
Change hash_64() return value to 32 bits
: Define hash_str() in terms of hashlen_string()
fs/namei.c: Add hashlen_string() function
Pull out string hash to

Linus Torvalds
2016-05-29 07:15:25 +0800
fcfd2fbf2 fs/namei.c: Add hashlen_string() function ... Browse Code »

We'd like to make more use of the highly-optimized dcache hash functions
throughout the kernel, rather than have every subsystem create its own,
and a function that hashes basic null-terminated strings is required
for that.

(The name is to emphasize that it returns both hash and length.)

It's actually useful in the dcache itself, specifically d_alloc_name().
Other uses in the next patch.

full_name_hash() is also tweaked to make it more generally useful:
1) Take a "char *" rather than "unsigned char *" argument, to
be consistent with hash_name().
2) Handle zero-length inputs. If we want more callers, we don't want
to make them worry about corner cases.

Signed-off-by: George Spelvin

George Spelvin
2016-05-29 03:42:50 +0800

19 May, 2016

1 commit

9e17632c0 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull misc vfs cleanups from Al Viro:
"Assorted cleanups and fixes all over the place"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
coredump: only charge written data against RLIMIT_CORE
coredump: get rid of coredump_params->written
ecryptfs_lookup(): try either only encrypted or plaintext name
ecryptfs: avoid multiple aliases for directories
bpf: reject invalid names right in ->lookup()
__d_alloc(): treat NULL name as QSTR("/", 1)
mtd: switch ubi_open_volume_path() to vfs_stat()
mtd: switch open_mtd_by_chdev() to use of vfs_stat()

Linus Torvalds
2016-05-19 02:51:59 +0800

03 May, 2016

1 commit

9902af79c parallel lookups: actual switch to rwsem ... Browse Code »

ta-da!

The main issue is the lack of down_write_killable(), so the places
like readdir.c switched to plain inode_lock(); once killable
variants of rwsem primitives appear, that'll be dealt with.

lockdep side also might need more work

Signed-off-by: Al Viro

Al Viro
2016-05-03 07:49:28 +0800