Eric Lee / smarc-fsl-linux-kernel

25 Feb, 2014

1 commit

fed95bab8 sysfs: fix namespace refcnt leak ... Browse Code »

As mount() and kill_sb() is not a one-to-one match, we shoudn't get
ns refcnt unconditionally in sysfs_mount(), and instead we should
get the refcnt only when kernfs_mount() allocated a new superblock.

v2:
- Changed the name of the new argument, suggested by Tejun.
- Made the argument optional, suggested by Tejun.

v3:
- Make the new argument as second-to-last arg, suggested by Tejun.

Signed-off-by: Li Zefan
Acked-by: Tejun Heo
---
fs/kernfs/mount.c | 8 +++++++-
fs/sysfs/mount.c | 5 +++--
include/linux/kernfs.h | 9 +++++----
3 files changed, 15 insertions(+), 7 deletions(-)
Signed-off-by: Greg Kroah-Hartman

Li Zefan
2014-02-25 23:37:52 +0800

06 Feb, 2014

1 commit

da9846ae1 kernfs: make kernfs_deactivate() honor KERNFS_LOCKDEP flag ... Browse Code »

kernfs_deactivate() forgot to check whether KERNFS_LOCKDEP is set
before performing lockdep annotations and ends up feeding
uninitialized lockdep_map to lockdep triggering warning like the
following on USB stick hotunplug.

usb 1-2: USB disconnect, device number 2
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 1 PID: 62 Comm: khubd Not tainted 3.13.0-work+ #82
Hardware name: empty empty/S3992, BIOS 080011 10/26/2007
ffff880065ca7f60 ffff88013a4ffa08 ffffffff81cfb6bd 0000000000000002
ffff88013a4ffac8 ffffffff810f8530 ffff88013a4fc710 0000000000000002
ffff880100000000 ffffffff82a3db50 0000000000000001 ffff88013a4fc710
Call Trace:
[] dump_stack+0x4e/0x7a
[] __lock_acquire+0x1910/0x1e70
[] lock_acquire+0x9a/0x1d0
[] kernfs_deactivate+0xee/0x130
[] kernfs_addrm_finish+0x38/0x60
[] kernfs_remove_by_name_ns+0x51/0xa0
[] remove_files.isra.1+0x41/0x80
[] sysfs_remove_group+0x47/0xa0
[] sysfs_remove_groups+0x33/0x50
[] device_remove_attrs+0x4d/0x80
[] device_del+0x12e/0x1d0
[] usb_disconnect+0x122/0x1a0
[] hub_thread+0x3c5/0x1290
[] kthread+0xed/0x110
[] ret_from_fork+0x7c/0xb0

Fix it by making kernfs_deactivate() perform lockdep annotations only
if KERNFS_LOCKDEP is set.

Signed-off-by: Tejun Heo
Reported-by: Fabio Estevam
Reported-by: Alan Stern
Reported-by: Jiri Kosina
Reported-by: Dave Jones
Tested-by: Fabio Estevam
Tested-by: Jiri Kosina
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2014-02-06 03:44:04 +0800

18 Jan, 2014

1 commit

db4aad209 kernfs: associate a new kernfs_node with its parent on creation ... Browse Code »

Once created, a kernfs_node is always destroyed by kernfs_put().
Since ba7443bc656e ("sysfs, kernfs: implement
kernfs_create/destroy_root()"), kernfs_put() depends on kernfs_root()
to locate the ino_ida. kernfs_root() in turn depends on
kernfs_node->parent being set for !dir nodes. This means that
kernfs_put() of a !dir node requires its ->parent to be initialized.

This leads to oops when a newly created !dir node is destroyed without
going through kernfs_add_one() or after failing kernfs_add_one()
before ->parent is set. kernfs_root() invoked from kernfs_put() will
try to dereference NULL parent.

Fix it by moving parent association to kernfs_new_node() from
kernfs_add_one(). kernfs_new_node() now takes @parent instead of
@root and determines the root from the parent and also sets the new
node's parent properly. @parent parameter is removed from
kernfs_add_one(). As there's no parent when creating the root node,
__kernfs_new_node() which takes @root as before and doesn't set the
parent is used in that case.

This ensures that a kernfs_node in any stage in its life has its
parent associated and thus can be put.

Signed-off-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2014-01-18 03:50:07 +0800

15 Jan, 2014

1 commit

bb305947b kernfs: fix get_active failure handling in kernfs_seq_*() ... Browse Code »

When kernfs_seq_start() fails to obtain an active reference, it
returns ERR_PTR(-ENODEV). kernfs_seq_stop() is then invoked with the
error pointer value; however, it still proceeds to invoke
kernfs_put_active() on the node leading to unbalanced put.

If kernfs_seq_stop() is called even after active ref failure, it
should skip invocation of @ops->seq_stop() and put_active.
Unfortunately, this is a bit complicated because active ref failure
isn't the only thing which may fail with ERR_PTR(-ENODEV).
@ops->seq_start/next() may also fail with the error value and
kernfs_seq_stop() doesn't have a way to tell apart those failures.

Work it around by factoring out the active part of kernfs_seq_stop()
into kernfs_seq_stop_active() and invoking it directly if
@ops->seq_start/next() fail with ERR_PTR(-ENODEV) and updating
kernfs_seq_stop() to skip kernfs_seq_stop_active() on
ERR_PTR(-ENODEV). This is a bit nasty but ensures that the active put
is skipped iff get_active failed in kernfs_seq_start().

tj: This was originally committed as d92d2e6bd72b but got reverted by
683bb2761fbf along with other kernfs self removal patches.
However, this one is an independent fix and shouldn't have been
reverted together. Reinstate the change. Sorry about the mess.

Signed-off-by: Tejun Heo
Cc: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2014-01-15 00:49:22 +0800

14 Jan, 2014

11 commits

683bb2761 Revert "kernfs: fix get_active failure handling in kernfs_seq_*()" ... Browse Code »

This reverts commit d92d2e6bd72b653f9811e0c9c46307c743b3fc58.

Tejun writes:
I'm sorry but can you please revert the whole series?
get_active() waiting while a node is deactivated has potential
to lead to deadlock and that deactivate/reactivate interface is
something fundamentally flawed and that cgroup will have to work
with the remove_self() like everybody else. IOW, I think the
first posting was correct.

Cc: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2014-01-14 06:49:01 +0800
87da14934 Revert "kernfs: replace kernfs_node->u.completion with kernfs_root->deactivate_waitq" ... Browse Code »

This reverts commit ea1c472dfeada211a0100daa7976e8e8e779b858.

Tejun writes:
I'm sorry but can you please revert the whole series?
get_active() waiting while a node is deactivated has potential
to lead to deadlock and that deactivate/reactivate interface is
something fundamentally flawed and that cgroup will have to work
with the remove_self() like everybody else. IOW, I think the
first posting was correct.

Cc: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2014-01-14 06:43:11 +0800
0890147fe Revert "kernfs: remove KERNFS_ACTIVE_REF and add kernfs_lockdep()" ... Browse Code »

This reverts commit a69d001cfc712b96ec9d7ba44d6285702a38dabf.

Tejun writes:
I'm sorry but can you please revert the whole series?
get_active() waiting while a node is deactivated has potential
to lead to deadlock and that deactivate/reactivate interface is
something fundamentally flawed and that cgroup will have to work
with the remove_self() like everybody else. IOW, I think the
first posting was correct.

Cc: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2014-01-14 06:39:52 +0800
798c75a0d Revert "kernfs: remove KERNFS_REMOVED" ... Browse Code »

This reverts commit ae34372eb8408b3d07e870f1939f99007a730d28.

Tejun writes:
I'm sorry but can you please revert the whole series?
get_active() waiting while a node is deactivated has potential
to lead to deadlock and that deactivate/reactivate interface is
something fundamentally flawed and that cgroup will have to work
with the remove_self() like everybody else. IOW, I think the
first posting was correct.

Cc: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2014-01-14 06:36:03 +0800
4f4b1b647 Revert "kernfs: restructure removal path to fix possible premature return" ... Browse Code »

This reverts commit 45a140e587f3d32d8d424ed940dffb61e1739047.

Tejun writes:
I'm sorry but can you please revert the whole series?
get_active() waiting while a node is deactivated has potential
to lead to deadlock and that deactivate/reactivate interface is
something fundamentally flawed and that cgroup will have to work
with the remove_self() like everybody else. IOW, I think the
first posting was correct.

Cc: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2014-01-14 06:30:47 +0800
55f6e30d0 Revert "kernfs: invoke kernfs_unmap_bin_file() directly from __kernfs_remove()" ... Browse Code »

This reverts commit f601f9a2bf7dc1f7ee18feece4c4e2fc6845d6c4.

Tejun writes:
I'm sorry but can you please revert the whole series?
get_active() waiting while a node is deactivated has potential
to lead to deadlock and that deactivate/reactivate interface is
something fundamentally flawed and that cgroup will have to work
with the remove_self() like everybody else. IOW, I think the
first posting was correct.

Cc: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2014-01-14 06:27:16 +0800
7653fe9d6 Revert "kernfs: remove kernfs_addrm_cxt" ... Browse Code »

This reverts commit 99177a34110889a8f2c36420c34e3bcc9bfd8a70.

Tejun writes:
I'm sorry but can you please revert the whole series?
get_active() waiting while a node is deactivated has potential
to lead to deadlock and that deactivate/reactivate interface is
something fundamentally flawed and that cgroup will have to work
with the remove_self() like everybody else. IOW, I think the
first posting was correct.

Cc: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2014-01-14 06:20:56 +0800
f4b3e631b Revert "kernfs: make kernfs_get_active() block if the node is deactivated but not removed" ... Browse Code »

This reverts commit 895a068a524e134900b9d98b519309b7aae7bbb1.

Tejun writes:
I'm sorry but can you please revert the whole series?
get_active() waiting while a node is deactivated has potential
to lead to deadlock and that deactivate/reactivate interface is
something fundamentally flawed and that cgroup will have to work
with the remove_self() like everybody else. IOW, I think the
first posting was correct.

Cc: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2014-01-14 06:13:39 +0800
9b0925a6f Revert "kernfs: implement kernfs_{de|re}activate[_self]()" ... Browse Code »

This reverts commit 9f010c2ad5194a4b682e747984477850fabd03be.

Tejun writes:
I'm sorry but can you please revert the whole series?
get_active() waiting while a node is deactivated has potential
to lead to deadlock and that deactivate/reactivate interface is
something fundamentally flawed and that cgroup will have to work
with the remove_self() like everybody else. IOW, I think the
first posting was correct.

Cc: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2014-01-14 06:09:38 +0800
a9f138b0e Revert "kernfs, sysfs, driver-core: implement kernfs_remove_self() and its wrappers" ... Browse Code »

This reverts commit 1ae06819c77cff1ea2833c94f8c093fe8a5c79db.

Tejun writes:
I'm sorry but can you please revert the whole series?
get_active() waiting while a node is deactivated has potential
to lead to deadlock and that deactivate/reactivate interface is
something fundamentally flawed and that cgroup will have to work
with the remove_self() like everybody else. IOW, I think the
first posting was correct.

Cc: Tejun Heo
Cc: Alan Stern
Cc: kbuild test robot
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2014-01-14 06:05:13 +0800
ce9b499c9 Revert "kernfs: remove unnecessary NULL check in __kernfs_remove()" ... Browse Code »

This reverts commit 88533f990c616cf50c2fe585ea03f75c806a293d.

Tejun writes:
I'm sorry but can you please revert the whole series?
get_active() waiting while a node is deactivated has potential
to lead to deadlock and that deactivate/reactivate interface is
something fundamentally flawed and that cgroup will have to work
with the remove_self() like everybody else. IOW, I think the
first posting was correct.

Cc: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2014-01-14 05:50:31 +0800

12 Jan, 2014

1 commit

88533f990 kernfs: remove unnecessary NULL check in __kernfs_remove() ... Browse Code »

895a068a524e ("kernfs: make kernfs_get_active() block if the node is
deactivated but not removed") added "struct kernfs_root *root =
kernfs_root(kn);" at the head of the function; however, the parameter
@kn is checked for later implying that the function may be called with
NULL. This means that we may end up invoking kernfs_root() with NULL
which will oops. None of the existing users invokes removal with NULL
@kn, so this bug doesn't actually trigger.

We can relocate kernfs_root() invocation after NULL check; however,
allowing NULL param tends to cause more confusion than actually
helping anything. As there's no existing user, let's remove the
spurious NULL check.

This bug was detected by smatch.

Signed-off-by: Tejun Heo
Reported-by: Dan Carpenter
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2014-01-12 07:40:19 +0800

11 Jan, 2014

10 commits

1ae06819c kernfs, sysfs, driver-core: implement kernfs_remove_self() and its wrappers ... Browse Code »

Sometimes it's necessary to implement a node which wants to delete
nodes including itself. This isn't straightforward because of kernfs
active reference. While a file operation is in progress, an active
reference is held and kernfs_remove() waits for all such references to
drain before completing. For a self-deleting node, this is a deadlock
as kernfs_remove() ends up waiting for an active reference that itself
is sitting on top of.

This currently is worked around in the sysfs layer using
sysfs_schedule_callback() which makes such removals asynchronous.
While it works, it's rather cumbersome and inherently breaks
synchronicity of the operation - the file operation which triggered
the operation may complete before the removal is finished (or even
started) and the removal may fail asynchronously. If a removal
operation is immmediately followed by another operation which expects
the specific name to be available (e.g. removal followed by rename
onto the same name), there's no way to make the latter operation
reliable.

The thing is there's no inherent reason for this to be asynchrnous.
All that's necessary to do this synchronous is a dedicated operation
which drops its own active ref and deactivates self. This patch
implements kernfs_remove_self() and its wrappers in sysfs and driver
core. kernfs_remove_self() is to be called from one of the file
operations, drops the active ref and deactivates using
__kernfs_deactivate_self(), removes the self node, and restores active
ref to the dead node using __kernfs_reactivate_self() so that the ref
is balanced afterwards. __kernfs_remove() is updated so that it takes
an early exit if the target node is already fully removed so that the
active ref restored by kernfs_remove_self() after removal doesn't
confuse the deactivation path.

This makes implementing self-deleting nodes very easy. The normal
removal path doesn't even need to be changed to use
kernfs_remove_self() for the self-deleting node. The method can
invoke kernfs_remove_self() on itself before proceeding the normal
removal path. kernfs_remove() invoked on the node by the normal
deletion path will simply be ignored.

This will replace sysfs_schedule_callback(). A subtle feature of
sysfs_schedule_callback() is that it collapses multiple invocations -
even if multiple removals are triggered, the removal callback is run
only once. An equivalent effect can be achieved by testing the return
value of kernfs_remove_self() - only the one which gets %true return
value should proceed with actual deletion. All other instances of
kernfs_remove_self() will wait till the enclosing kernfs operation
which invoked the winning instance of kernfs_remove_self() finishes
and then return %false. This trivially makes all users of
kernfs_remove_self() automatically show correct synchronous behavior
even when there are multiple concurrent operations - all "echo 1 >
delete" instances will finish only after the whole operation is
completed by one of the instances.

v2: For !CONFIG_SYSFS, dummy version kernfs_remove_self() was missing
and sysfs_remove_file_self() had incorrect return type. Fix it.
Reported by kbuild test bot.

v3: Updated to use __kernfs_{de|re}activate_self().

Signed-off-by: Tejun Heo
Cc: Alan Stern
Cc: kbuild test robot
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2014-01-11 06:01:05 +0800
9f010c2ad kernfs: implement kernfs_{de|re}activate[_self]() ... Browse Code »

This patch implements four functions to manipulate deactivation state
- deactivate, reactivate and the _self suffixed pair. A new fields
kernfs_node->deact_depth is added so that concurrent and nested
deactivations are handled properly. kernfs_node->hash is moved so
that it's paired with the new field so that it doesn't increase the
size of kernfs_node.

A kernfs user's lock would normally nest inside active ref but during
removal the user may want to perform kernfs_remove() while holding the
said lock, which would introduce a reverse locking dependency. This
function can be used to break such reverse dependency by allowing
deactivation step to performed separately outside user's critical
section.

This will also be used implement kernfs_remove_self().

Signed-off-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2014-01-11 05:51:21 +0800
895a068a5 kernfs: make kernfs_get_active() block if the node is deactivated but not removed ... Browse Code »

Currently, kernfs_get_active() fails if the target node is
deactivated. This is fine as a node always gets removed after
deactivation; however, we're gonna add reactivation so the assumption
won't hold. It'd be incorrect for kernfs_get_active() to fail for a
node which was deactivated only temporarily.

This patch makes kernfs_get_active() block if the node is deactivated
but not removed. If the node gets reactivated (not yet implemented),
it will be retried and succeed. If the node gets removed, it will be
woken up and fail.

Signed-off-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2014-01-11 05:48:08 +0800
99177a341 kernfs: remove kernfs_addrm_cxt ... Browse Code »

kernfs_addrm_cxt and the accompanying kernfs_addrm_start/finish() were
added because there were operations which should be performed outside
kernfs_mutex after adding and removing kernfs_nodes. The necessary
operations were recorded in kernfs_addrm_cxt and performed by
kernfs_addrm_finish(); however, after the recent changes which
relocated deactivation and unmapping so that they're performed
directly during removal, the only operation kernfs_addrm_finish()
performs is kernfs_put(), which can be moved inside the removal path
too.

This patch moves the kernfs_put() of the base ref to __kernfs_remove()
and remove kernfs_addrm_cxt and kernfs_addrm_start/finish().

* kernfs_add_one() is updated to grab and release the parent's active
ref and kernfs_mutex itself. kernfs_get/put_active() and
kernfs_addrm_start/finish() invocations around it are removed from
all users.

* __kernfs_remove() puts an unlinked node directly instead of chaining
it to kernfs_addrm_cxt. Its callers are updated to grab and release
kernfs_mutex instead of calling kernfs_addrm_start/finish() around
it.

v2: Updated to fit the v2 restructuring of removal path.

Signed-off-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2014-01-11 05:48:08 +0800
f601f9a2b kernfs: invoke kernfs_unmap_bin_file() directly from __kernfs_remove() ... Browse Code »

kernfs_unmap_bin_file() is supposed to unmap all memory mappings of
the target file before kernfs_remove() finishes; however, it currently
is being called from kernfs_addrm_finish() and has the same race
problem as the original implementation of deactivation when there are
multiple removers - only the remover which snatches the node to its
addrm_cxt->removed list is guaranteed to wait for its completion
before returning.

It can be fixed by moving kernfs_unmap_bin_file() invocation from
kernfs_addrm_finish() to __kernfs_remove(). The function may be
called multiple times but that shouldn't do any harm.

We end up dropping kernfs_mutex in the removal loop and the node may
be removed inbetween by someone else. kernfs_unlink_sibling() is
updated to test whether the node has already been removed and return
accordingly. __kernfs_remove() in turn performs post-unlinking
cleanup only if it actually unlinked the node.

KERNFS_HAS_MMAP test is moved out of the unmap function into
__kernfs_remove() so that we don't unlock kernfs_mutex unnecessarily.
While at it, drop the now meaningless "bin" qualifier from the
function name.

v2: Rewritten to fit the v2 restructuring of removal path. HAS_MMAP
test relocated.

Signed-off-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2014-01-11 05:48:08 +0800
45a140e58 kernfs: restructure removal path to fix possible premature return ... Browse Code »

The recursive nature of kernfs_remove() means that, even if
kernfs_remove() is not allowed to be called multiple times on the same
node, there may be race conditions between removal of parent and its
descendants. While we can claim that kernfs_remove() shouldn't be
called on one of the descendants while the removal of an ancestor is
in progress, such rule is unnecessarily restrictive and very difficult
to enforce. It's better to simply allow invoking kernfs_remove() as
the caller sees fit as long as the caller ensures that the node is
accessible.

The current behavior in such situations is broken. Whoever enters
removal path first takes the node off the hierarchy and then
deactivates. Following removers either return as soon as it notices
that it's not the first one or can't even find the target node as it
has already been removed from the hierarchy. In both cases, the
following removers may finish prematurely while the nodes which should
be removed and drained are still being processed by the first one.

This patch restructures so that multiple removers, whether through
recursion or direction invocation, always follow the following rules.

* When there are multiple concurrent removers, only one puts the base
ref.

* Regardless of which one puts the base ref, all removers are blocked
until the target node is fully deactivated and removed.

To achieve the above, removal path now first deactivates the subtree,
drains it and then unlinks one-by-one. __kernfs_deactivate() is
called directly from __kernfs_removal() and drops and regrabs
kernfs_mutex for each descendant to drain active refs. As this means
that multiple removers can enter __kernfs_deactivate() for the same
node, the function is updated so that it can handle multiple
deactivators of the same node - only one actually deactivates but all
wait till drain completion.

The restructured removal path guarantees that a removed node gets
unlinked only after the node is deactivated and drained. Combined
with proper multiple deactivator handling, this guarantees that any
invocation of kernfs_remove() returns only after the node itself and
all its descendants are deactivated, drained and removed.

v2: Draining separated into a separate loop (used to be in the same
loop as unlink) and done from __kernfs_deactivate(). This is to
allow exposing deactivation as a separate interface later.

Root node removal was broken in v1 patch. Fixed.

Signed-off-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2014-01-11 05:48:08 +0800
ae34372eb kernfs: remove KERNFS_REMOVED ... Browse Code »

KERNFS_REMOVED is used to mark half-initialized and dying nodes so
that they don't show up in lookups and deny adding new nodes under or
renaming it; however, its role overlaps those of deactivation and
removal from rbtree.

It's necessary to deny addition of new children while removal is in
progress; however, this role considerably intersects with deactivation
- KERNFS_REMOVED prevents new children while deactivation prevents new
file operations. There's no reason to have them separate making
things more complex than necessary.

KERNFS_REMOVED is also used to decide whether a node is still visible
to vfs layer, which is rather redundant as equivalent determination
can be made by testing whether the node is on its parent's children
rbtree or not.

This patch removes KERNFS_REMOVED.

* Instead of KERNFS_REMOVED, each node now starts its life
deactivated. This means that we now use both atomic_add() and
atomic_sub() on KN_DEACTIVATED_BIAS, which is INT_MIN. The compiler
generates an overflow warnings when negating INT_MIN as the negation
can't be represented as a positive number. Nothing is actually
broken but let's bump BIAS by one to avoid the warnings for archs
which negates the subtrahend..

* KERNFS_REMOVED tests in add and rename paths are replaced with
kernfs_get/put_active() of the target nodes. Due to the way the add
path is structured now, active ref handling is done in the callers
of kernfs_add_one(). This will be consolidated up later.

* kernfs_remove_one() is updated to deactivate instead of setting
KERNFS_REMOVED. This removes deactivation from kernfs_deactivate(),
which is now renamed to kernfs_drain().

* kernfs_dop_revalidate() now tests RB_EMPTY_NODE(&kn->rb) instead of
KERNFS_REMOVED and KERNFS_REMOVED test in kernfs_dir_pos() is
dropped. A node which is removed from the children rbtree is not
included in the iteration in the first place. This means that a
node may be visible through vfs a bit longer - it's now also visible
after deactivation until the actual removal. This slightly enlarged
window difference doesn't make any difference to the userland.

* Sanity check on KERNFS_REMOVED in kernfs_put() is replaced with
checks on the active ref.

* Some comment style updates in the affected area.

v2: Reordered before removal path restructuring. kernfs_active()
dropped and kernfs_get/put_active() used instead. RB_EMPTY_NODE()
used in the lookup paths.

Signed-off-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2014-01-11 05:44:25 +0800
a69d001cf kernfs: remove KERNFS_ACTIVE_REF and add kernfs_lockdep() ... Browse Code »

There currently are two mechanisms gating active ref lockdep
annotations - KERNFS_LOCKDEP flag and KERNFS_ACTIVE_REF type mask.
The former disables lockdep annotations in kernfs_get/put_active()
while the latter disables all of kernfs_deactivate().

While KERNFS_ACTIVE_REF also behaves as an optimization to skip the
deactivation step for non-file nodes, the benefit is marginal and it
needlessly diverges code paths. Let's drop KERNFS_ACTIVE_REF and use
KERNFS_LOCKDEP in kernfs_deactivate() too.

While at it, add a test helper kernfs_lockdep() to test KERNFS_LOCKDEP
flag so that it's more convenient and the related code can be compiled
out when not enabled.

Signed-off-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2014-01-11 05:44:25 +0800
ea1c472df kernfs: replace kernfs_node->u.completion with kernfs_root->deactivate_waitq ... Browse Code »

kernfs_node->u.completion is used to notify deactivation completion
from kernfs_put_active() to kernfs_deactivate(). We now allow
multiple racing removals of the same node and the current removal
scheme is no longer correct - kernfs_remove() invocation may return
before the node is properly deactivated if it races against another
removal. The removal path will be restructured to address the issue.

To help such restructure which requires supporting multiple waiters,
this patch replaces kernfs_node->u.completion with
kernfs_root->deactivate_waitq. This makes deactivation event
notifications share a per-root waitqueue_head; however, the wait path
is quite cold and this will also allow shaving one pointer off
kernfs_node.

Signed-off-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2014-01-11 05:44:25 +0800
d92d2e6bd kernfs: fix get_active failure handling in kernfs_seq_*() ... Browse Code »

When kernfs_seq_start() fails to obtain an active reference, it
returns ERR_PTR(-ENODEV). kernfs_seq_stop() is then invoked with the
error pointer value; however, it still proceeds to invoke
kernfs_put_active() on the node leading to unbalanced put.

If kernfs_seq_stop() is called even after active ref failure, it
should skip invocation of @ops->seq_stop() and put_active.
Unfortunately, this is a bit complicated because active ref failure
isn't the only thing which may fail with ERR_PTR(-ENODEV).
@ops->seq_start/next() may also fail with the error value and
kernfs_seq_stop() doesn't have a way to tell apart those failures.

Work it around by factoring out the active part of kernfs_seq_stop()
into kernfs_seq_stop_active() and invoking it directly if
@ops->seq_start/next() fail with ERR_PTR(-ENODEV) and updating
kernfs_seq_stop() to skip kernfs_seq_stop_active() on
ERR_PTR(-ENODEV). This is a bit nasty but ensures that the active put
is skipped iff get_active failed in kernfs_seq_start().

Signed-off-by: Tejun Heo
Cc: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2014-01-11 05:44:25 +0800

18 Dec, 2013

6 commits

80b9bbefc kernfs: add kernfs_dir_ops ... Browse Code »

Add support for mkdir(2), rmdir(2) and rename(2) syscalls. This is
implemented through optional kernfs_dir_ops callback table which can
be specified on kernfs_create_root(). An implemented callback is
invoked when the matching syscall is invoked.

As kernfs keep dcache syncs with internal representation and
revalidates dentries on each access, the implementation of these
methods is extremely simple. Each just discovers the relevant
kernfs_node(s) and invokes the requested callback which is allowed to
do any kernfs operations and the end result doesn't necessarily have
to match the expected semantics of the syscall.

This will be used to convert cgroup to use kernfs instead of its own
filesystem implementation.

Signed-off-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2013-12-18 00:59:15 +0800
19bbb9262 kernfs: allow negative dentries ... Browse Code »

kernfs doesn't allow negative dentries - kernfs_iop_lookup() returns
ERR_PTR(-ENOENT) instead of NULL which short-circuits negative dentry
creation and kernfs's d_delete() callback, kernfs_dop_delete(),
returns 1 for all removed nodes. This in turn allows
kernfs_dop_revalidate() to assume that there's no negative dentry for
kernfs.

This worked fine for sysfs but kernfs is scheduled to grow mkdir(2)
support which depend on negative dentries. This patch updates so that
kernfs allows negative dentries. The required changes are almost
trivial - kernfs_iop_lookup() now returns NULL instead of
ERR_PTR(-ENOENT) when the target kernfs_node doesn't exist,
kernfs_dop_delete() is removed and kernfs_dop_revalidate() is updated
to check whether the target dentry is negative and request fresh
lookup if so.

Signed-off-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2013-12-18 00:59:15 +0800
47a52e91f kernfs: update kernfs_rename_ns() to consider KERNFS_STATIC_NAME ... Browse Code »

kernfs_rename_ns() currently assumes that the target sysfs_dirent has
a copied name. This has been okay because sysfs supports rename only
for directories which always have copied names; however, there's
nothing in kernfs interface which calls for such restriction and
currently invoking kernfs_rename_ns() on a regular file leads to oops
because it ends up trying to kfree() a static name.

This patch updates kernfs_rename_ns() so that it skips kfree() of the
old name if it's static. This allows it to be used for all node
types.

Signed-off-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2013-12-18 00:59:15 +0800
2063d608f kernfs: mark static names with KERNFS_STATIC_NAME ... Browse Code »

Because sysfs used struct attribute which are supposed to stay
constant, sysfs didn't copy names when creating regular files. The
specified string for name was supposed to stay constant. Such
distinction isn't inherent for kernfs. kernfs_create_file[_ns]()
should be able to take the same @name as kernfs_create_dir[_ns]()

As there can be huge number of sysfs attributes, we still want to be
able to use static names for sysfs attributes. This patch renames
kernfs_create_file_ns_key() to __kernfs_create_file() and adds
@name_is_static parameter so that the caller can explicitly indicate
that @name can be used without copying. kernfs is updated to use
KERNFS_STATIC_NAME to distinguish static and copied names.

This patch doesn't introduce any behavior changes.

Signed-off-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2013-12-18 00:59:15 +0800
d0ae3d434 kernfs: add REMOVED check to create and rename paths ... Browse Code »

kernfs currently assumes that the caller doesn't try to create a new
node under a removed parent, rename a removed node, or move a node
under a removed node. While this works fine for sysfs, it'd be nice
to have protection against such cases especially given that kernfs is
planned to add support for mkdir, rmdir and rename requsts from
userland which may make race conditions more likely.

This patch updates create and rename paths to check REMOVED and fail
the operation with -ENOENT if performed on or towards removed nodes.
Note that remove path already has such check.

Signed-off-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2013-12-18 00:59:15 +0800
bb8b9d095 kernfs: add @mode to kernfs_create_dir[_ns]() ... Browse Code »

sysfs assumed 0755 for all newly created directories and kernfs
inherited it. This assumption is unnecessarily restrictive and
inconsistent with kernfs_create_file[_ns](). This patch adds @mode
parameter to kernfs_create_dir[_ns]() and update uses in sysfs
accordingly. Among others, this will be useful for implementations of
the planned ->mkdir() method.

This patch doesn't introduce any behavior differences.

Signed-off-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2013-12-18 00:59:15 +0800

12 Dec, 2013

6 commits

c637b8acb kernfs: s/sysfs/kernfs/ in internal functions and whatever is left ... Browse Code »

kernfs has just been separated out from sysfs and we're already in
full conflict mode. Nothing can make the situation any worse. Let's
take the chance to name things properly.

This patch performs the following renames.

* s/sysfs_*()/kernfs_*()/ in all internal functions
* s/sysfs/kernfs/ in internal strings, comments and whatever is remaining
* Uniformly rename various vfs operations so that they're consistently
named and distinguishable.

This patch is strictly rename only and doesn't introduce any
functional difference.

Signed-off-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2013-12-12 09:39:20 +0800
a797bfc30 kernfs: s/sysfs/kernfs/ in global variables ... Browse Code »

kernfs has just been separated out from sysfs and we're already in
full conflict mode. Nothing can make the situation any worse. Let's
take the chance to name things properly.

This patch performs the following renames.

* s/sysfs_mutex/kernfs_mutex/
* s/sysfs_dentry_ops/kernfs_dops/
* s/sysfs_dir_operations/kernfs_dir_fops/
* s/sysfs_dir_inode_operations/kernfs_dir_iops/
* s/kernfs_file_operations/kernfs_file_fops/ - renamed for consistency
* s/sysfs_symlink_inode_operations/kernfs_symlink_iops/
* s/sysfs_aops/kernfs_aops/
* s/sysfs_backing_dev_info/kernfs_bdi/
* s/sysfs_inode_operations/kernfs_iops/
* s/sysfs_dir_cachep/kernfs_node_cache/
* s/sysfs_ops/kernfs_sops/

This patch is strictly rename only and doesn't introduce any
functional difference.

Signed-off-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2013-12-12 09:39:20 +0800
df23fc39b kernfs: s/sysfs/kernfs/ in constants ... Browse Code »

kernfs has just been separated out from sysfs and we're already in
full conflict mode. Nothing can make the situation any worse. Let's
take the chance to name things properly.

This patch performs the following renames.

* s/SYSFS_DIR/KERNFS_DIR/
* s/SYSFS_KOBJ_ATTR/KERNFS_FILE/
* s/SYSFS_KOBJ_LINK/KERNFS_LINK/
* s/SYSFS_{TYPE_FLAGS}/KERNFS_{TYPE_FLAGS}/
* s/SYSFS_FLAG_{FLAG}/KERNFS_{FLAG}/
* s/sysfs_type()/kernfs_type()/
* s/SD_DEACTIVATED_BIAS/KN_DEACTIVATED_BIAS/

This patch is strictly rename only and doesn't introduce any
functional difference.

Signed-off-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2013-12-12 09:39:20 +0800
c525aaddc kernfs: s/sysfs/kernfs/ in various data structures ... Browse Code »

kernfs has just been separated out from sysfs and we're already in
full conflict mode. Nothing can make the situation any worse. Let's
take the chance to name things properly.

This patch performs the following renames.

* s/sysfs_open_dirent/kernfs_open_node/
* s/sysfs_open_file/kernfs_open_file/
* s/sysfs_inode_attrs/kernfs_iattrs/
* s/sysfs_addrm_cxt/kernfs_addrm_cxt/
* s/sysfs_super_info/kernfs_super_info/
* s/sysfs_info()/kernfs_info()/
* s/sysfs_open_dirent_lock/kernfs_open_node_lock/
* s/sysfs_open_file_mutex/kernfs_open_file_mutex/
* s/sysfs_of()/kernfs_of()/

This patch is strictly rename only and doesn't introduce any
functional difference.

Signed-off-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2013-12-12 09:39:20 +0800
adc5e8b58 kernfs: drop s_ prefix from kernfs_node members ... Browse Code »

kernfs has just been separated out from sysfs and we're already in
full conflict mode. Nothing can make the situation any worse. Let's
take the chance to name things properly.

s_ prefix for kernfs members is used inconsistently and a misnomer
now. It's not like kernfs_node is used widely across the kernel
making the ability to grep for the members particularly useful. Let's
just drop the prefix.

This patch is strictly rename only and doesn't introduce any
functional difference.

Signed-off-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2013-12-12 07:43:48 +0800
324a56e16 kernfs: s/sysfs_dirent/kernfs_node/ and rename its friends accordingly ... Browse Code »

kernfs has just been separated out from sysfs and we're already in
full conflict mode. Nothing can make the situation any worse. Let's
take the chance to name things properly.

This patch performs the following renames.

* s/sysfs_elem_dir/kernfs_elem_dir/
* s/sysfs_elem_symlink/kernfs_elem_symlink/
* s/sysfs_elem_attr/kernfs_elem_file/
* s/sysfs_dirent/kernfs_node/
* s/sd/kn/ in kernfs proper
* s/parent_sd/parent/
* s/target_sd/target/
* s/dir_sd/parent/
* s/to_sysfs_dirent()/rb_to_kn()/
* misc renames of local vars when they conflict with the above

Because md, mic and gpio dig into sysfs details, this patch ends up
modifying them. All are sysfs_dirent renames and trivial. While we
can avoid these by introducing a dummy wrapping struct sysfs_dirent
around kernfs_node, given the limited usage outside kernfs and sysfs
proper, I don't think such workaround is called for.

This patch is strictly rename only and doesn't introduce any
functional difference.

- mic / gpio renames were missing. Spotted by kbuild test robot.

Signed-off-by: Tejun Heo
Cc: Neil Brown
Cc: Linus Walleij
Cc: Ashutosh Dixit
Cc: kbuild test robot
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2013-12-12 07:28:36 +0800

11 Dec, 2013

1 commit

9b2db6e18 sysfs: bail early from kernfs_file_mmap() to avoid spurious lockdep warning ... Browse Code »

This is v3.14 fix for the same issue that a8b14744429f ("sysfs: give
different locking key to regular and bin files") addresses for v3.13.
Due to the extensive kernfs reorganization in v3.14 branch, the same
fix couldn't be ported as-is. The v3.13 fix was ignored while merging
it into v3.14 branch.

027a485d12e0 ("sysfs: use a separate locking class for open files
depending on mmap") assigned different lockdep key to
sysfs_open_file->mutex depending on whether the file implements mmap
or not in an attempt to avoid spurious lockdep warning caused by
merging of regular and bin file paths.

While this restored some of the original behavior of using different
locks (at least lockdep is concerned) for the different clases of
files. The restoration wasn't full because now the lockdep key
assignment depends on whether the file has mmap or not instead of
whether it's a regular file or not.

This means that bin files which don't implement mmap will get assigned
the same lockdep class as regular files. This is problematic because
file_operations for bin files still implements the mmap file operation
and checking whether the sysfs file actually implements mmap happens
in the file operation after grabbing @sysfs_open_file->mutex. We
still end up adding locking dependency from mmap locking to
sysfs_open_file->mutex to the regular file mutex which triggers
spurious circular locking warning.

For v3.13, a8b14744429f ("sysfs: give different locking key to regular
and bin files") fixed it by giving sysfs_open_file->mutex different
lockdep keys depending on whether the file is regular or bin instead
of whether mmap exists or not; however, due to the way sysfs is now
layered behind kernfs, this approach is no longer viable. kernfs can
tell whether a sysfs node has mmap implemented or not but can't tell
whether a bin file from a regular one.

This patch updates kernfs such that kernfs_file_mmap() checks
SYSFS_FLAG_HAS_MMAP and bail before grabbing sysfs_open_file->mutex so
that it doesn't add spurious locking dependency from mmap to
sysfs_open_file->mutex and changes sysfs so that it specifies
kernfs_ops->mmap iff the sysfs file implements mmap. Combined, this
ensures that sysfs_open_file->mutex is grabbed under mmap path iff the
sysfs file actually implements mmap. As sysfs_open_file->mutex is
already given a different lockdep key if mmap is implemented, this
removes the spurious locking dependency.

Signed-off-by: Tejun Heo
Reported-by: Dave Jones
Link: http://lkml.kernel.org/g/20131203184324.GA11320@redhat.com
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2013-12-11 13:33:31 +0800

09 Dec, 2013

1 commit

21d71662f sysfs, kernfs: remove duplicated include from file.c ... Browse Code »

Remove duplicated include.

Signed-off-by: Wei Yongjun
Signed-off-by: Greg Kroah-Hartman

Wei Yongjun
2013-12-09 10:16:19 +0800