Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

09 Apr, 2014

1 commit

e9f37d3a8 Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux ... Browse Code »

Pull drm updates from Dave Airlie:
"Highlights:

- drm:

Generic display port aux features, primary plane support, drm
master management fixes, logging cleanups, enforced locking checks
(instead of docs), documentation improvements, minor number
handling cleanup, pseudofs for shared inodes.

- ttm:

add ability to allocate from both ends

- i915:

broadwell features, power domain and runtime pm, per-process
address space infrastructure (not enabled)

- msm:

power management, hdmi audio support

- nouveau:

ongoing GPU fault recovery, initial maxwell support, random fixes

- exynos:

refactored driver to clean up a lot of abstraction, DP support
moved into drm, LVDS bridge support added, parallel panel support

- gma500:

SGX MMU support, SGX irq handling, asle irq work fixes

- radeon:

video engine bringup, ring handling fixes, use dp aux helpers

- vmwgfx:

add rendernode support"

* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (849 commits)
DRM: armada: fix corruption while loading cursors
drm/dp_helper: don't return EPROTO for defers (v2)
drm/bridge: export ptn3460_init function
drm/exynos: remove MODULE_DEVICE_TABLE definitions
ARM: dts: exynos4412-trats2: enable exynos/fimd node
ARM: dts: exynos4210-trats: enable exynos/fimd node
ARM: dts: exynos4412-trats2: add panel node
ARM: dts: exynos4210-trats: add panel node
ARM: dts: exynos4: add MIPI DSI Master node
drm/panel: add S6E8AA0 driver
ARM: dts: exynos4210-universal_c210: add proper panel node
drm/panel: add ld9040 driver
panel/ld9040: add DT bindings
panel/s6e8aa0: add DT bindings
drm/exynos: add DSIM driver
exynos/dsim: add DT bindings
drm/exynos: disallow fbdev initialization if no device is connected
drm/mipi_dsi: create dsi devices only for nodes with reg property
drm/mipi_dsi: add flags to DSI messages
Skip intel_crt_init for Dell XPS 8700
...

Linus Torvalds
2014-04-09 00:52:16 +0800

01 Apr, 2014

1 commit

da1ce0670 vfs: add cross-rename ... Browse Code »
26

If flags contain RENAME_EXCHANGE then exchange source and destination files.
There's no restriction on the type of the files; e.g. a directory can be
exchanged with a symlink.

Signed-off-by: Miklos Szeredi
Reviewed-by: Jan Kara
Reviewed-by: J. Bruce Fields

Miklos Szeredi
2014-04-01 23:08:43 +0800

31 Mar, 2014

1 commit

0654a65f2 Merge tag 'v3.14' into drm-intel-next-queued ... Browse Code »

Linux 3.14

The vt-d w/a merged late in 3.14-rc needs a bit of fine-tuning, hence
backmerge.

Conflicts:
drivers/gpu/drm/i915/i915_gem_gtt.c
drivers/gpu/drm/i915/intel_ddi.c
drivers/gpu/drm/i915/intel_dp.c

All trivial adjacent lines changed type conflicts, so trivial git
doesn't even show them in the merg commit.

Signed-off-by: Daniel Vetter

Daniel Vetter
2014-03-31 16:45:15 +0800

23 Mar, 2014

1 commit

e825196d4 make prepend_name() work correctly when called with negative *buflen ... Browse Code »

In all callchains leading to prepend_name(), the value left in *buflen
is eventually discarded unused if prepend_name() has returned a negative.
So we are free to do what prepend() does, and subtract from *buflen
*before* checking for underflow (which turns into checking the sign
of subtraction result, of course).

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro

Al Viro
2014-03-23 12:28:40 +0800

16 Mar, 2014

1 commit

31bbe16f6 drm: add pseudo filesystem for shared inodes ... Browse Code »
5

Our current DRM design uses a single address_space for all users of the
same DRM device. However, there is no way to create an anonymous
address_space without an underlying inode. Therefore, we wait for the
first ->open() callback on a registered char-dev and take-over the inode
of the char-dev. This worked well so far, but has several drawbacks:
- We screw with FS internals and rely on some non-obvious invariants like
inode->i_mapping being the same as inode->i_data for char-devs.
- We don't have any address_space prior to the first ->open() from
user-space. This leads to ugly fallback code and we cannot allocate
global objects early.

As pointed out by Al-Viro, fs/anon_inode.c is *not* supposed to be used by
drivers for anonymous inode-allocation. Therefore, this patch follows the
proposed alternative solution and adds a pseudo filesystem mount-point to
DRM. We can then allocate private inodes including a private address_space
for each DRM device at initialization time.

Note that we could use:
sysfs_get_inode(sysfs_mnt->mnt_sb, drm_device->dev->kobj.sd);
to get access to the underlying sysfs-inode of a "struct device" object.
However, most of this information is currently hidden and it's not clear
whether this address_space is suitable for driver access. Thus, unless
linux allows anonymous address_space objects or driver-core provides a
public inode per device, we're left with our own private internal mount
point.

Cc: Al Viro
Signed-off-by: David Herrmann

David Herrmann
2014-03-16 19:17:03 +0800

27 Jan, 2014

1 commit

f65008015 __dentry_path() fixes ... Browse Code »

* we need to save the starting point for restarts
* reject pathologically short buffers outright

Spotted-by: Denys Vlasenko
Spotted-by: Oleg Nesterov
Signed-off-by: Al Viro

Al Viro
2014-01-27 01:37:55 +0800

26 Jan, 2014

1 commit

a8323da03 vfs: Remove second variable named error in __dentry_path ... Browse Code »

In commit 232d2d60aa5469bb097f55728f65146bd49c1d25
Author: Waiman Long
Date: Mon Sep 9 12:18:13 2013 -0400

dcache: Translating dentry into pathname without taking rename_lock

The __dentry_path locking was changed and the variable error was
intended to be moved outside of the loop. Unfortunately the inner
declaration of error was not removed. Resulting in a version of
__dentry_path that will never return an error.

Remove the problematic inner declaration of error and allow
__dentry_path to return errors once again.

Cc: stable@vger.kernel.org
Cc: Waiman Long
Signed-off-by: "Eric W. Biederman"
Signed-off-by: Al Viro

Eric W. Biederman
2014-01-26 21:26:43 +0800

18 Jan, 2014

1 commit

48ba620aa Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace ... Browse Code »

Pull namespace fixes from Eric Biederman:
"This is a set of 3 regression fixes.

This fixes /proc/mounts when using "ip netns add " to display
the actual mount point.

This fixes a regression in clone that broke lxc-attach.

This fixes a regression in the permission checks for mounting /proc
that made proc unmountable if binfmt_misc was in use. Oops.

My apologies for sending this pull request so late. Al Viro gave
interesting review comments about the d_path fix that I wanted to
address in detail before I sent this pull request. Unfortunately a
bad round of colds kept from addressing that in detail until today.
The executive summary of the review was:

Al: Is patching d_path really sufficient?
The prepend_path, d_path, d_absolute_path, and __d_path family of
functions is a really mess.

Me: Yes, patching d_path is really sufficient. Yes, the code is mess.
No it is not appropriate to rewrite all of d_path for a regression
that has existed for entirely too long already, when a two line
change will do"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
vfs: Fix a regression in mounting proc
fork: Allow CLONE_PARENT after setns(CLONE_NEWPID)
vfs: In d_path don't call d_dname on a mount point

Linus Torvalds
2014-01-18 09:29:36 +0800

13 Dec, 2013

1 commit

a5c21dcef dcache: allow word-at-a-time name hashing with big-endian CPUs ... Browse Code »

When explicitly hashing the end of a string with the word-at-a-time
interface, we have to be careful which end of the word we pick up.

On big-endian CPUs, the upper-bits will contain the data we're after, so
ensure we generate our masks accordingly (and avoid hashing whatever
random junk may have been sitting after the string).

This patch adds a new dcache helper, bytemask_from_count, which creates
a mask appropriate for the CPU endianness.

Cc: Al Viro
Signed-off-by: Will Deacon
Signed-off-by: Linus Torvalds

Will Deacon
2013-12-13 02:39:01 +0800

27 Nov, 2013

1 commit

f48cfddc6 vfs: In d_path don't call d_dname on a mount point ... Browse Code »
13

Aditya Kali (adityakali@google.com) wrote:
> Commit bf056bfa80596a5d14b26b17276a56a0dcb080e5:
> "proc: Fix the namespace inode permission checks." converted
> the namespace files into symlinks. The same commit changed
> the way namespace bind mounts appear in /proc/mounts:
> $ mount --bind /proc/self/ns/ipc /mnt/ipc
> Originally:
> $ cat /proc/mounts | grep ipc
> proc /mnt/ipc proc rw,nosuid,nodev,noexec 0 0
>
> After commit bf056bfa80596a5d14b26b17276a56a0dcb080e5:
> $ cat /proc/mounts | grep ipc
> proc ipc:[4026531839] proc rw,nosuid,nodev,noexec 0 0
>
> This breaks userspace which expects the 2nd field in
> /proc/mounts to be a valid path.

The symlink /proc//ns/{ipc,mnt,net,pid,user,uts} point to
dentries allocated with d_alloc_pseudo that we can mount, and
that have interesting names printed out with d_dname.

When these files are bind mounted /proc/mounts is not currently
displaying the mount point correctly because d_dname is called instead
of just displaying the path where the file is mounted.

Solve this by adding an explicit check to distinguish mounted pseudo
inodes and unmounted pseudo inodes. Unmounted pseudo inodes always
use mount of their filesstem as the mnt_root in their path making
these two cases easy to distinguish.

CC: stable@vger.kernel.org
Acked-by: Serge Hallyn
Reported-by: Aditya Kali
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2013-11-27 12:53:58 +0800

16 Nov, 2013

3 commits

31dec1327 fold try_to_ascend() into the sole remaining caller ... Browse Code »

There used to be a bunch of tree-walkers in dcache.c, all alike.
try_to_ascend() had been introduced to abstract a piece of logics
duplicated in all of them. These days all these tree-walkers are
implemented via the same iterator (d_walk()), which is the only
remaining caller of try_to_ascend(), so let's fold it back...

Signed-off-by: Al Viro

Al Viro
2013-11-16 11:04:17 +0800
482db9066 dcache.c: get rid of pointless macros ... Browse Code »
2

D_HASH{MASK,BITS} are used once each, both in the same function (d_hash()).
At this point they are actively misguiding - they imply that values are
compiler constants, which is no longer true.

Signed-off-by: Al Viro

Al Viro
2013-11-16 11:04:17 +0800
2bc74feba take read_seqbegin_or_lock() and friends to seqlock.h ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-11-16 11:04:17 +0800

14 Nov, 2013

1 commit

5e30025a3 Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull core locking changes from Ingo Molnar:
"The biggest changes:

- add lockdep support for seqcount/seqlocks structures, this
unearthed both bugs and required extra annotation.

- move the various kernel locking primitives to the new
kernel/locking/ directory"

* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
block: Use u64_stats_init() to initialize seqcounts
locking/lockdep: Mark __lockdep_count_forward_deps() as static
lockdep/proc: Fix lock-time avg computation
locking/doc: Update references to kernel/mutex.c
ipv6: Fix possible ipv6 seqlock deadlock
cpuset: Fix potential deadlock w/ set_mems_allowed
seqcount: Add lockdep functionality to seqcount/seqlock structures
net: Explicitly initialize u64_stats_sync structures for lockdep
locking: Move the percpu-rwsem code to kernel/locking/
locking: Move the lglocks code to kernel/locking/
locking: Move the rwsem code to kernel/locking/
locking: Move the rtmutex code to kernel/locking/
locking: Move the semaphore core to kernel/locking/
locking: Move the spinlock code to kernel/locking/
locking: Move the lockdep code to kernel/locking/
locking: Move the mutex code to kernel/locking/
hung_task debugging: Add tracepoint to report the hang
x86/locking/kconfig: Update paravirt spinlock Kconfig description
lockstat: Report avg wait and hold times
lockdep, x86/alternatives: Drop ancient lockdep fixup message
...

Linus Torvalds
2013-11-14 15:30:30 +0800

13 Nov, 2013

3 commits

ede4cebce prepend_path() needs to reinitialize dentry/vfsmount/mnt on restarts ... Browse Code »
2

... and equivalent is needed in 3.12; it's broken there as well

Signed-off-by: Al Viro

Al Viro
2013-11-13 20:45:40 +0800
4ec6c2aea fix unpaired rcu lock in prepend_path() ... Browse Code »

Signed-off-by: Li Zhong
Signed-off-by: Al Viro

Li Zhong
2013-11-13 20:43:10 +0800
9bc9ccd7d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs updates from Al Viro:
"All kinds of stuff this time around; some more notable parts:

- RCU'd vfsmounts handling
- new primitives for coredump handling
- files_lock is gone
- Bruce's delegations handling series
- exportfs fixes

plus misc stuff all over the place"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits)
ecryptfs: ->f_op is never NULL
locks: break delegations on any attribute modification
locks: break delegations on link
locks: break delegations on rename
locks: helper functions for delegation breaking
locks: break delegations on unlink
namei: minor vfs_unlink cleanup
locks: implement delegations
locks: introduce new FL_DELEG lock flag
vfs: take i_mutex on renamed file
vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
vfs: don't use PARENT/CHILD lock classes for non-directories
vfs: pull ext4's double-i_mutex-locking into common code
exportfs: fix quadratic behavior in filehandle lookup
exportfs: better variable name
exportfs: move most of reconnect_path to helper function
exportfs: eliminate unused "noprogress" counter
exportfs: stop retrying once we race with rename/remove
exportfs: clear DISCONNECTED on all parents sooner
exportfs: more detailed comment for path_reconnect
...

Linus Torvalds
2013-11-13 14:34:18 +0800

09 Nov, 2013

7 commits

f80de2cde dcache: don't clear DCACHE_DISCONNECTED too early ... Browse Code »

DCACHE_DISCONNECTED should not be cleared until we're sure the dentry is
connected all the way up to the root of the filesystem. It *shouldn't*
be cleared as soon as the dentry is connected to a parent. That will
cause bugs at least on exportable filesystems.

Acked-by: Christoph Hellwig
Signed-off-by: J. Bruce Fields
Signed-off-by: Al Viro

J. Bruce Fields
2013-11-09 13:16:34 +0800
e1a24bb0a dcache: Don't set DISCONNECTED on "pseudo filesystem" dentries ... Browse Code »

I can't for the life of me see any reason why anyone should care whether
a dentry that is never hooked into the dentry cache would need
DCACHE_DISCONNECTED set.

This originates from 4b936885ab04dc6e0bb0ef35e0e23c1a7364d9e5 "fs:
improve scalability of pseudo filesystems", which probably just made the
false assumption the DCACHE_DISCONNECTED was meant to be set on anything
not connected to a parent somehow.

So this is just confusing. Ideally the only uses of DCACHE_DISCONNECTED
would be in the filehandle-lookup code, which needs it to ensure
dentries are connected into the dentry tree before use.

I left d_alloc_pseudo there even though it's now equivalent to
__d_alloc(), just on the theory the name is better documentation of its
intended use outside dcache.c.

Cc: Nick Piggin
Acked-by: Christoph Hellwig
Signed-off-by: J. Bruce Fields
Signed-off-by: Al Viro

J. Bruce Fields
2013-11-09 13:16:33 +0800
7632e465f dcache: use IS_ROOT to decide where dentry is hashed ... Browse Code »

Every hashed dentry is either hashed in the dentry_hashtable, or a
superblock's s_anon list.

__d_drop() assumes it can determine which is the case by checking
DCACHE_DISCONNECTED; this is not true.

It is true that when DCACHE_DISCONNECTED is cleared, the dentry is not
only hashed on dentry_hashtable, but is fully connected to its parents
back to the root.

But the converse is *not* true: fs/exportfs/expfs.c:reconnect_path()
attempts to connect a directory (found by filehandle lookup) back to
root by ascending to parents and performing lookups one at a time. It
does not clear DCACHE_DISCONNECTED until it's done, and that is not at
all an atomic process.

In particular, it is possible for DCACHE_DISCONNECTED to be set on a
dentry which is hashed on the dentry_hashtable.

Instead, use IS_ROOT() to check which hash chain a dentry is on. This
*does* work:

Dentries are hashed only by:

- d_obtain_alias, which adds an IS_ROOT() dentry to sb_anon.

- __d_rehash, called by _d_rehash: hashes to the dentry's
parent, and all callers of _d_rehash appear to have d_parent
set to a "real" parent.
- __d_rehash, called by __d_move: rehashes the moved dentry to
hash chain determined by target, and assigns target's d_parent
to its d_parent, before dropping the dentry's d_lock.

Therefore I believe it's safe for a holder of a dentry's d_lock to
assume that it is hashed on sb_anon if and only if IS_ROOT(dentry) is
true.

I believe the incorrect assumption about DCACHE_DISCONNECTED was
originally introduced by ceb5bdc2d246 "fs: dcache per-bucket dcache hash
locking".

Also add a comment while we're here.

Cc: Nick Piggin
Acked-by: Christoph Hellwig
Reviewed-by: NeilBrown
Signed-off-by: J. Bruce Fields
Signed-off-by: Al Viro

J. Bruce Fields
2013-11-09 13:16:33 +0800
b18825a7c VFS: Put a small type field into struct dentry::d_flags ... Browse Code »
5

Put a type field into struct dentry::d_flags to indicate if the dentry is one
of the following types that relate particularly to pathwalk:

Miss (negative dentry)
Directory
"Automount" directory (defective - no i_op->lookup())
Symlink
Other (regular, socket, fifo, device)

The type field is set to one of the first five types on a dentry by calls to
__d_instantiate() and d_obtain_alias() from information in the inode (if one is
given).

The type is cleared by dentry_unlink_inode() when it reconstitutes an existing
dentry as a negative dentry.

Accessors provided are:

d_set_type(dentry, type)
d_is_directory(dentry)
d_is_autodir(dentry)
d_is_symlink(dentry)
d_is_file(dentry)
d_is_negative(dentry)
d_is_positive(dentry)

A bunch of checks in pathname resolution switched to those.

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2013-11-09 13:16:30 +0800
b61625d24 fold __d_shrink() into its only remaining caller ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-11-09 13:16:21 +0800
48a066e72 RCU'd vfsmounts ... Browse Code »

* RCU-delayed freeing of vfsmounts
* vfsmount_lock replaced with a seqlock (mount_lock)
* sequence number from mount_lock is stored in nameidata->m_seq and
used when we exit RCU mode
* new vfsmount flag - MNT_SYNC_UMOUNT. Set by umount_tree() when its
caller knows that vfsmount will have no surviving references.
* synchronize_rcu() done between unlocking namespace_sem in namespace_unlock()
and doing pending mntput().
* new helper: legitimize_mnt(mnt, seq). Checks the mount_lock sequence
number against seq, then grabs reference to mnt. Then it rechecks mount_lock
again to close the race and either returns success or drops the reference it
has acquired. The subtle point is that in case of MNT_SYNC_UMOUNT we can
simply decrement the refcount and sod off - aforementioned synchronize_rcu()
makes sure that final mntput() won't come until we leave RCU mode. We need
that, since we don't want to end up with some lazy pathwalk racing with
umount() and stealing the final mntput() from it - caller of umount() may
expect it to return only once the fs is shut down and we don't want to break
that. In other cases (i.e. with MNT_SYNC_UMOUNT absent) we have to do
full-blown mntput() in case of mount_lock sequence number mismatch happening
just as we'd grabbed the reference, but in those cases we won't be stealing
the final mntput() from anything that would care.
* mntput_no_expire() doesn't lock anything on the fast path now. Incidentally,
SMP and UP cases are handled the same way - no ifdefs there.
* normal pathname resolution does *not* do any writes to mount_lock. It does,
of course, bump the refcounts of vfsmount and dentry in the very end, but that's
it.

Signed-off-by: Al Viro

Al Viro
2013-11-09 13:16:19 +0800
42c326082 switch shrink_dcache_for_umount() to use of d_walk() ... Browse Code »

we have too many iterators in fs/dcache.c...

Signed-off-by: Al Viro

Al Viro
2013-11-09 13:16:06 +0800

06 Nov, 2013

1 commit

1ca7d67cf seqcount: Add lockdep functionality to seqcount/seqlock structures ... Browse Code »
18

Currently seqlocks and seqcounts don't support lockdep.

After running across a seqcount related deadlock in the timekeeping
code, I used a less-refined and more focused variant of this patch
to narrow down the cause of the issue.

This is a first-pass attempt to properly enable lockdep functionality
on seqlocks and seqcounts.

Since seqcounts are used in the vdso gettimeofday code, I've provided
non-lockdep accessors for those needs.

I've also handled one case where there were nested seqlock writers
and there may be more edge cases.

Comments and feedback would be appreciated!

Signed-off-by: John Stultz
Signed-off-by: Peter Zijlstra
Cc: Eric Dumazet
Cc: Li Zefan
Cc: Mathieu Desnoyers
Cc: Steven Rostedt
Cc: "David S. Miller"
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1381186321-4906-3-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar

John Stultz
2013-11-06 19:40:26 +0800

01 Nov, 2013

1 commit

358eec182 vfs: decrapify dput(), fix cache behavior under normal load ... Browse Code »

We do not want to dirty the dentry->d_flags cacheline in dput() just to
set the DCACHE_REFERENCED flag when it is already set in the common case
anyway. This way the first cacheline of the dentry (which contains the
RCU lookup information etc) can stay shared among multiple CPU's.

This finishes off some of the details of all the scalability patches
merged during the merge window.

Also don't mark dentry_kill() for inlining, since it's the uncommon path
and inlining it just makes the common path slower due to extra function
entry/exit overhead.

Signed-off-by: Linus Torvalds

Linus Torvalds
2013-11-01 06:43:02 +0800

25 Oct, 2013

2 commits

b70a80e7a vfs: introduce d_instantiate_no_diralias() ... Browse Code »

...which just returns -EBUSY if a directory alias would be created.

This is to be used by fuse mkdir to make sure that a buggy or malicious
userspace filesystem doesn't do anything nasty. Previously fuse used a
private mutex for this purpose, which can now go away.

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2013-10-25 11:41:37 +0800
94e92a6e7 move taking vfsmount_lock down into prepend_path() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-10-25 11:35:01 +0800

22 Oct, 2013

1 commit

69c88dc7d vfs: fix new kernel-doc warnings ... Browse Code »

Move kernel-doc notation to immediately before its function to eliminate
kernel-doc warnings introduced by commit db14fc3abcd5 ("vfs: add
d_walk()")

Warning(fs/dcache.c:1343): No description found for parameter 'data'
Warning(fs/dcache.c:1343): No description found for parameter 'dentry'
Warning(fs/dcache.c:1343): Excess function parameter 'parent' description in 'check_mount'

Signed-off-by: Randy Dunlap
Cc: Miklos Szeredi
Cc: Al Viro
Signed-off-by: Linus Torvalds

Randy Dunlap
2013-10-22 19:02:40 +0800

15 Sep, 2013

1 commit

05a8252bd vfs: fix typo in comment in recent dentry work ... Browse Code »

Sedat points out that I transposed some letters in "LRU" and wrote "RLU"
instead in one of the new comments explaining the flow. Let's just fix
it.

Reported-by: Sedat Dilek
Signed-off-by: Linus Torvalds

Linus Torvalds
2013-09-15 19:11:01 +0800

14 Sep, 2013

1 commit

89dc77bcd vfs: fix dentry LRU list handling and nr_dentry_unused accounting ... Browse Code »

The LRU list changes interacted badly with our nr_dentry_unused
accounting, and even worse with the new DCACHE_LRU_LIST bit logic.

This introduces helper functions to make sure everything follows the
proper dcache d_lru list rules: the dentry cache is complicated by the
fact that some of the hotpaths don't even want to look at the LRU list
at all, and the fact that we use the same list entry in the dentry for
both the LRU list and for our temporary shrinking lists when removing
things from the LRU.

The helper functions temporarily have some extra sanity checking for the
flag bits that have to match the current LRU state of the dentry. We'll
remove that before the final 3.12 release, but considering how easy it
is to get wrong, this first cleanup version has some very particular
sanity checking.

Acked-by: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2013-09-14 10:55:10 +0800

13 Sep, 2013

7 commits

26935fb06 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs pile 4 from Al Viro:
"list_lru pile, mostly"

This came out of Andrew's pile, Al ended up doing the merge work so that
Andrew didn't have to.

Additionally, a few fixes.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (42 commits)
super: fix for destroy lrus
list_lru: dynamically adjust node arrays
shrinker: Kill old ->shrink API.
shrinker: convert remaining shrinkers to count/scan API
staging/lustre/libcfs: cleanup linux-mem.h
staging/lustre/ptlrpc: convert to new shrinker API
staging/lustre/obdclass: convert lu_object shrinker to count/scan API
staging/lustre/ldlm: convert to shrinkers to count/scan API
hugepage: convert huge zero page shrinker to new shrinker API
i915: bail out earlier when shrinker cannot acquire mutex
drivers: convert shrinkers to new count/scan API
fs: convert fs shrinkers to new scan/count API
xfs: fix dquot isolation hang
xfs-convert-dquot-cache-lru-to-list_lru-fix
xfs: convert dquot cache lru to list_lru
xfs: rework buffer dispose list tracking
xfs-convert-buftarg-lru-to-generic-code-fix
xfs: convert buftarg LRU to generic code
fs: convert inode and dentry shrinking to be node aware
vmscan: per-node deferred work
...

Linus Torvalds
2013-09-13 06:01:38 +0800
68f0d9d92 vfs: make d_path() get the root path under RCU ... Browse Code »

This avoids the spinlocks and refcounts in the d_path() sequence too
(used by /proc and various other entities). See commit 8b19e34188a3 for
the equivalent getcwd() system call path.

And unlike getcwd(), d_path() doesn't copy the result to user space, so
I don't need to fear _that_ particular bug happening again.

Signed-off-by: Linus Torvalds

Linus Torvalds
2013-09-13 04:24:55 +0800
3272c544d vfs: use __getname/__putname for getcwd() system call ... Browse Code »

It's a pathname. It should use the pathname allocators and
deallocators, and PATH_MAX instead of PAGE_SIZE. Never mind that the
two are commonly the same.

With this, the allocations scale up nicely too, and I can do getcwd()
system calls at a rate of about 300M/s, with no lock contention
anywhere.

Of course, nobody sane does that, especially since getcwd() is
traditionally a very slow operation in Unix. But this was also the
simplest way to benchmark the prepend_path() improvements by Waiman, and
once I saw the profiles I couldn't leave it well enough alone.

But apart from being an performance improvement (from using per-cpu slab
allocators instead of the raw page allocator), it's actually a valid and
real cleanup.

Signed-off-by: Linus "OCD" Torvalds

Linus Torvalds
2013-09-13 03:40:15 +0800
ff812d724 vfs: don't copy things to user space holding the rcu readlock ... Browse Code »

Oops. That wasn't very smart. We don't actually need the RCU lock any
more by the time we copy the cwd string to user space, but I had
stupidly surrounded the whole thing with it.

Introduced by commit 8b19e34188a3 ("vfs: make getcwd() get the root and
pwd path under rcu")

Is-a-big-hairy-idiot: Linus Torvalds

Linus Torvalds
2013-09-13 02:57:01 +0800
8b19e3418 vfs: make getcwd() get the root and pwd path under rcu ... Browse Code »

This allows us to skip all the crazy spinlocks and reference count
updates, and instead use the fs sequence read-lock to get an atomic
snapshot of the root and cwd information.

We might want to make the rule that "prepend_path()" is always called
with the RCU lock held, but the RCU lock nests fine and this is the
minimal fix.

Signed-off-by: Linus Torvalds

Linus Torvalds
2013-09-13 01:35:47 +0800
5762482f5 vfs: move get_fs_root_and_pwd() to single caller ... Browse Code »

Let's not pollute the include files with inline functions that are only
used in a single place. Especially not if we decide we might want to
change the semantics of said function to make it more efficient..

Signed-off-by: Linus Torvalds

Linus Torvalds
2013-09-13 01:12:47 +0800
181299772 dcache: get/release read lock in read_seqbegin_or_lock() & friend ... Browse Code »

This patch modifies read_seqbegin_or_lock() and need_seqretry() to use
newly introduced read_seqlock_excl() and read_sequnlock_excl()
primitives so that they won't change the sequence number even if they
fall back to take the lock. This is OK as no change to the protected
data structure is being made.

It will prevent one fallback to lock taking from cascading into a series
of lock taking reducing performance because of the sequence number
change. It will also allow other sequence readers to go forward while
an exclusive reader lock is taken.

This patch also updates some of the inaccurate comments in the code.

Signed-off-by: Waiman Long
To: Alexander Viro
Signed-off-by: Linus Torvalds

Waiman Long
2013-09-13 00:25:23 +0800

11 Sep, 2013

2 commits

9b17c6238 fs: convert inode and dentry shrinking to be node aware ... Browse Code »

Now that the shrinker is passing a node in the scan control structure, we
can pass this to the the generic LRU list code to isolate reclaim to the
lists on matching nodes.

Signed-off-by: Dave Chinner
Signed-off-by: Glauber Costa
Acked-by: Mel Gorman
Cc: "Theodore Ts'o"
Cc: Adrian Hunter
Cc: Al Viro
Cc: Artem Bityutskiy
Cc: Arve Hjønnevåg
Cc: Carlos Maiolino
Cc: Christoph Hellwig
Cc: Chuck Lever
Cc: Daniel Vetter
Cc: David Rientjes
Cc: Gleb Natapov
Cc: Greg Thelen
Cc: J. Bruce Fields
Cc: Jan Kara
Cc: Jerome Glisse
Cc: John Stultz
Cc: KAMEZAWA Hiroyuki
Cc: Kent Overstreet
Cc: Kirill A. Shutemov
Cc: Marcelo Tosatti
Cc: Mel Gorman
Cc: Steven Whitehouse
Cc: Thomas Hellstrom
Cc: Trond Myklebust
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Dave Chinner
2013-09-11 06:56:31 +0800
4e717f5c1 list_lru: remove special case function list_lru_dispose_all. ... Browse Code »

The list_lru implementation has one function, list_lru_dispose_all, with
only one user (the dentry code). At first, such function appears to make
sense because we are really not interested in the result of isolating each
dentry separately - all of them are going away anyway. However, it's
implementation is buggy in the following way:

When we call list_lru_dispose_all in fs/dcache.c, we scan all dentries
marking them with DCACHE_SHRINK_LIST. However, this is done without the
nlru->lock taken. The imediate result of that is that someone else may
add or remove the dentry from the LRU at the same time. When list_lru_del
happens in that scenario we will see an element that is not yet marked
with DCACHE_SHRINK_LIST (even though it will be in the future) and
obviously remove it from an lru where the element no longer is. Since
list_lru_dispose_all will in effect count down nlru's nr_items and
list_lru_del will do the same, this will lead to an imbalance.

The solution for this would not be so simple: we can obviously just keep
the lru_lock taken, but then we have no guarantees that we will be able to
acquire the dentry lock (dentry->d_lock). To properly solve this, we need
a communication mechanism between the lru and dentry code, so they can
coordinate this with each other.

Such mechanism already exists in the form of the list_lru_walk_cb
callback. So it is possible to construct a dcache-side prune function
that does the right thing only by calling list_lru_walk in a loop until no
more dentries are available.

With only one user, plus the fact that a sane solution for the problem
would involve boucing between dcache and list_lru anyway, I see little
justification to keep the special case list_lru_dispose_all in tree.

Signed-off-by: Glauber Costa
Cc: Michal Hocko
Acked-by: Dave Chinner
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Glauber Costa
2013-09-11 06:56:31 +0800