25 Feb, 2014

1 commit

  • As mount() and kill_sb() is not a one-to-one match, we shoudn't get
    ns refcnt unconditionally in sysfs_mount(), and instead we should
    get the refcnt only when kernfs_mount() allocated a new superblock.

    v2:
    - Changed the name of the new argument, suggested by Tejun.
    - Made the argument optional, suggested by Tejun.

    v3:
    - Make the new argument as second-to-last arg, suggested by Tejun.

    Signed-off-by: Li Zefan
    Acked-by: Tejun Heo
    ---
    fs/kernfs/mount.c | 8 +++++++-
    fs/sysfs/mount.c | 5 +++--
    include/linux/kernfs.h | 9 +++++----
    3 files changed, 15 insertions(+), 7 deletions(-)
    Signed-off-by: Greg Kroah-Hartman

    Li Zefan
     

06 Feb, 2014

1 commit

  • kernfs_deactivate() forgot to check whether KERNFS_LOCKDEP is set
    before performing lockdep annotations and ends up feeding
    uninitialized lockdep_map to lockdep triggering warning like the
    following on USB stick hotunplug.

    usb 1-2: USB disconnect, device number 2
    INFO: trying to register non-static key.
    the code is fine but needs lockdep annotation.
    turning off the locking correctness validator.
    CPU: 1 PID: 62 Comm: khubd Not tainted 3.13.0-work+ #82
    Hardware name: empty empty/S3992, BIOS 080011 10/26/2007
    ffff880065ca7f60 ffff88013a4ffa08 ffffffff81cfb6bd 0000000000000002
    ffff88013a4ffac8 ffffffff810f8530 ffff88013a4fc710 0000000000000002
    ffff880100000000 ffffffff82a3db50 0000000000000001 ffff88013a4fc710
    Call Trace:
    [] dump_stack+0x4e/0x7a
    [] __lock_acquire+0x1910/0x1e70
    [] lock_acquire+0x9a/0x1d0
    [] kernfs_deactivate+0xee/0x130
    [] kernfs_addrm_finish+0x38/0x60
    [] kernfs_remove_by_name_ns+0x51/0xa0
    [] remove_files.isra.1+0x41/0x80
    [] sysfs_remove_group+0x47/0xa0
    [] sysfs_remove_groups+0x33/0x50
    [] device_remove_attrs+0x4d/0x80
    [] device_del+0x12e/0x1d0
    [] usb_disconnect+0x122/0x1a0
    [] hub_thread+0x3c5/0x1290
    [] kthread+0xed/0x110
    [] ret_from_fork+0x7c/0xb0

    Fix it by making kernfs_deactivate() perform lockdep annotations only
    if KERNFS_LOCKDEP is set.

    Signed-off-by: Tejun Heo
    Reported-by: Fabio Estevam
    Reported-by: Alan Stern
    Reported-by: Jiri Kosina
    Reported-by: Dave Jones
    Tested-by: Fabio Estevam
    Tested-by: Jiri Kosina
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     

18 Jan, 2014

1 commit

  • Once created, a kernfs_node is always destroyed by kernfs_put().
    Since ba7443bc656e ("sysfs, kernfs: implement
    kernfs_create/destroy_root()"), kernfs_put() depends on kernfs_root()
    to locate the ino_ida. kernfs_root() in turn depends on
    kernfs_node->parent being set for !dir nodes. This means that
    kernfs_put() of a !dir node requires its ->parent to be initialized.

    This leads to oops when a newly created !dir node is destroyed without
    going through kernfs_add_one() or after failing kernfs_add_one()
    before ->parent is set. kernfs_root() invoked from kernfs_put() will
    try to dereference NULL parent.

    Fix it by moving parent association to kernfs_new_node() from
    kernfs_add_one(). kernfs_new_node() now takes @parent instead of
    @root and determines the root from the parent and also sets the new
    node's parent properly. @parent parameter is removed from
    kernfs_add_one(). As there's no parent when creating the root node,
    __kernfs_new_node() which takes @root as before and doesn't set the
    parent is used in that case.

    This ensures that a kernfs_node in any stage in its life has its
    parent associated and thus can be put.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     

15 Jan, 2014

1 commit

  • When kernfs_seq_start() fails to obtain an active reference, it
    returns ERR_PTR(-ENODEV). kernfs_seq_stop() is then invoked with the
    error pointer value; however, it still proceeds to invoke
    kernfs_put_active() on the node leading to unbalanced put.

    If kernfs_seq_stop() is called even after active ref failure, it
    should skip invocation of @ops->seq_stop() and put_active.
    Unfortunately, this is a bit complicated because active ref failure
    isn't the only thing which may fail with ERR_PTR(-ENODEV).
    @ops->seq_start/next() may also fail with the error value and
    kernfs_seq_stop() doesn't have a way to tell apart those failures.

    Work it around by factoring out the active part of kernfs_seq_stop()
    into kernfs_seq_stop_active() and invoking it directly if
    @ops->seq_start/next() fail with ERR_PTR(-ENODEV) and updating
    kernfs_seq_stop() to skip kernfs_seq_stop_active() on
    ERR_PTR(-ENODEV). This is a bit nasty but ensures that the active put
    is skipped iff get_active failed in kernfs_seq_start().

    tj: This was originally committed as d92d2e6bd72b but got reverted by
    683bb2761fbf along with other kernfs self removal patches.
    However, this one is an independent fix and shouldn't have been
    reverted together. Reinstate the change. Sorry about the mess.

    Signed-off-by: Tejun Heo
    Cc: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     

14 Jan, 2014

11 commits

  • This reverts commit d92d2e6bd72b653f9811e0c9c46307c743b3fc58.

    Tejun writes:
    I'm sorry but can you please revert the whole series?
    get_active() waiting while a node is deactivated has potential
    to lead to deadlock and that deactivate/reactivate interface is
    something fundamentally flawed and that cgroup will have to work
    with the remove_self() like everybody else. IOW, I think the
    first posting was correct.

    Cc: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • This reverts commit ea1c472dfeada211a0100daa7976e8e8e779b858.

    Tejun writes:
    I'm sorry but can you please revert the whole series?
    get_active() waiting while a node is deactivated has potential
    to lead to deadlock and that deactivate/reactivate interface is
    something fundamentally flawed and that cgroup will have to work
    with the remove_self() like everybody else. IOW, I think the
    first posting was correct.

    Cc: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • This reverts commit a69d001cfc712b96ec9d7ba44d6285702a38dabf.

    Tejun writes:
    I'm sorry but can you please revert the whole series?
    get_active() waiting while a node is deactivated has potential
    to lead to deadlock and that deactivate/reactivate interface is
    something fundamentally flawed and that cgroup will have to work
    with the remove_self() like everybody else. IOW, I think the
    first posting was correct.

    Cc: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • This reverts commit ae34372eb8408b3d07e870f1939f99007a730d28.

    Tejun writes:
    I'm sorry but can you please revert the whole series?
    get_active() waiting while a node is deactivated has potential
    to lead to deadlock and that deactivate/reactivate interface is
    something fundamentally flawed and that cgroup will have to work
    with the remove_self() like everybody else. IOW, I think the
    first posting was correct.

    Cc: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • This reverts commit 45a140e587f3d32d8d424ed940dffb61e1739047.

    Tejun writes:
    I'm sorry but can you please revert the whole series?
    get_active() waiting while a node is deactivated has potential
    to lead to deadlock and that deactivate/reactivate interface is
    something fundamentally flawed and that cgroup will have to work
    with the remove_self() like everybody else. IOW, I think the
    first posting was correct.

    Cc: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • This reverts commit f601f9a2bf7dc1f7ee18feece4c4e2fc6845d6c4.

    Tejun writes:
    I'm sorry but can you please revert the whole series?
    get_active() waiting while a node is deactivated has potential
    to lead to deadlock and that deactivate/reactivate interface is
    something fundamentally flawed and that cgroup will have to work
    with the remove_self() like everybody else. IOW, I think the
    first posting was correct.

    Cc: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • This reverts commit 99177a34110889a8f2c36420c34e3bcc9bfd8a70.

    Tejun writes:
    I'm sorry but can you please revert the whole series?
    get_active() waiting while a node is deactivated has potential
    to lead to deadlock and that deactivate/reactivate interface is
    something fundamentally flawed and that cgroup will have to work
    with the remove_self() like everybody else. IOW, I think the
    first posting was correct.

    Cc: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • This reverts commit 895a068a524e134900b9d98b519309b7aae7bbb1.

    Tejun writes:
    I'm sorry but can you please revert the whole series?
    get_active() waiting while a node is deactivated has potential
    to lead to deadlock and that deactivate/reactivate interface is
    something fundamentally flawed and that cgroup will have to work
    with the remove_self() like everybody else. IOW, I think the
    first posting was correct.

    Cc: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • This reverts commit 9f010c2ad5194a4b682e747984477850fabd03be.

    Tejun writes:
    I'm sorry but can you please revert the whole series?
    get_active() waiting while a node is deactivated has potential
    to lead to deadlock and that deactivate/reactivate interface is
    something fundamentally flawed and that cgroup will have to work
    with the remove_self() like everybody else. IOW, I think the
    first posting was correct.

    Cc: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • This reverts commit 1ae06819c77cff1ea2833c94f8c093fe8a5c79db.

    Tejun writes:
    I'm sorry but can you please revert the whole series?
    get_active() waiting while a node is deactivated has potential
    to lead to deadlock and that deactivate/reactivate interface is
    something fundamentally flawed and that cgroup will have to work
    with the remove_self() like everybody else. IOW, I think the
    first posting was correct.

    Cc: Tejun Heo
    Cc: Alan Stern
    Cc: kbuild test robot
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • This reverts commit 88533f990c616cf50c2fe585ea03f75c806a293d.

    Tejun writes:
    I'm sorry but can you please revert the whole series?
    get_active() waiting while a node is deactivated has potential
    to lead to deadlock and that deactivate/reactivate interface is
    something fundamentally flawed and that cgroup will have to work
    with the remove_self() like everybody else. IOW, I think the
    first posting was correct.

    Cc: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

12 Jan, 2014

1 commit

  • 895a068a524e ("kernfs: make kernfs_get_active() block if the node is
    deactivated but not removed") added "struct kernfs_root *root =
    kernfs_root(kn);" at the head of the function; however, the parameter
    @kn is checked for later implying that the function may be called with
    NULL. This means that we may end up invoking kernfs_root() with NULL
    which will oops. None of the existing users invokes removal with NULL
    @kn, so this bug doesn't actually trigger.

    We can relocate kernfs_root() invocation after NULL check; however,
    allowing NULL param tends to cause more confusion than actually
    helping anything. As there's no existing user, let's remove the
    spurious NULL check.

    This bug was detected by smatch.

    Signed-off-by: Tejun Heo
    Reported-by: Dan Carpenter
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     

11 Jan, 2014

10 commits

  • Sometimes it's necessary to implement a node which wants to delete
    nodes including itself. This isn't straightforward because of kernfs
    active reference. While a file operation is in progress, an active
    reference is held and kernfs_remove() waits for all such references to
    drain before completing. For a self-deleting node, this is a deadlock
    as kernfs_remove() ends up waiting for an active reference that itself
    is sitting on top of.

    This currently is worked around in the sysfs layer using
    sysfs_schedule_callback() which makes such removals asynchronous.
    While it works, it's rather cumbersome and inherently breaks
    synchronicity of the operation - the file operation which triggered
    the operation may complete before the removal is finished (or even
    started) and the removal may fail asynchronously. If a removal
    operation is immmediately followed by another operation which expects
    the specific name to be available (e.g. removal followed by rename
    onto the same name), there's no way to make the latter operation
    reliable.

    The thing is there's no inherent reason for this to be asynchrnous.
    All that's necessary to do this synchronous is a dedicated operation
    which drops its own active ref and deactivates self. This patch
    implements kernfs_remove_self() and its wrappers in sysfs and driver
    core. kernfs_remove_self() is to be called from one of the file
    operations, drops the active ref and deactivates using
    __kernfs_deactivate_self(), removes the self node, and restores active
    ref to the dead node using __kernfs_reactivate_self() so that the ref
    is balanced afterwards. __kernfs_remove() is updated so that it takes
    an early exit if the target node is already fully removed so that the
    active ref restored by kernfs_remove_self() after removal doesn't
    confuse the deactivation path.

    This makes implementing self-deleting nodes very easy. The normal
    removal path doesn't even need to be changed to use
    kernfs_remove_self() for the self-deleting node. The method can
    invoke kernfs_remove_self() on itself before proceeding the normal
    removal path. kernfs_remove() invoked on the node by the normal
    deletion path will simply be ignored.

    This will replace sysfs_schedule_callback(). A subtle feature of
    sysfs_schedule_callback() is that it collapses multiple invocations -
    even if multiple removals are triggered, the removal callback is run
    only once. An equivalent effect can be achieved by testing the return
    value of kernfs_remove_self() - only the one which gets %true return
    value should proceed with actual deletion. All other instances of
    kernfs_remove_self() will wait till the enclosing kernfs operation
    which invoked the winning instance of kernfs_remove_self() finishes
    and then return %false. This trivially makes all users of
    kernfs_remove_self() automatically show correct synchronous behavior
    even when there are multiple concurrent operations - all "echo 1 >
    delete" instances will finish only after the whole operation is
    completed by one of the instances.

    v2: For !CONFIG_SYSFS, dummy version kernfs_remove_self() was missing
    and sysfs_remove_file_self() had incorrect return type. Fix it.
    Reported by kbuild test bot.

    v3: Updated to use __kernfs_{de|re}activate_self().

    Signed-off-by: Tejun Heo
    Cc: Alan Stern
    Cc: kbuild test robot
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • This patch implements four functions to manipulate deactivation state
    - deactivate, reactivate and the _self suffixed pair. A new fields
    kernfs_node->deact_depth is added so that concurrent and nested
    deactivations are handled properly. kernfs_node->hash is moved so
    that it's paired with the new field so that it doesn't increase the
    size of kernfs_node.

    A kernfs user's lock would normally nest inside active ref but during
    removal the user may want to perform kernfs_remove() while holding the
    said lock, which would introduce a reverse locking dependency. This
    function can be used to break such reverse dependency by allowing
    deactivation step to performed separately outside user's critical
    section.

    This will also be used implement kernfs_remove_self().

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • Currently, kernfs_get_active() fails if the target node is
    deactivated. This is fine as a node always gets removed after
    deactivation; however, we're gonna add reactivation so the assumption
    won't hold. It'd be incorrect for kernfs_get_active() to fail for a
    node which was deactivated only temporarily.

    This patch makes kernfs_get_active() block if the node is deactivated
    but not removed. If the node gets reactivated (not yet implemented),
    it will be retried and succeed. If the node gets removed, it will be
    woken up and fail.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • kernfs_addrm_cxt and the accompanying kernfs_addrm_start/finish() were
    added because there were operations which should be performed outside
    kernfs_mutex after adding and removing kernfs_nodes. The necessary
    operations were recorded in kernfs_addrm_cxt and performed by
    kernfs_addrm_finish(); however, after the recent changes which
    relocated deactivation and unmapping so that they're performed
    directly during removal, the only operation kernfs_addrm_finish()
    performs is kernfs_put(), which can be moved inside the removal path
    too.

    This patch moves the kernfs_put() of the base ref to __kernfs_remove()
    and remove kernfs_addrm_cxt and kernfs_addrm_start/finish().

    * kernfs_add_one() is updated to grab and release the parent's active
    ref and kernfs_mutex itself. kernfs_get/put_active() and
    kernfs_addrm_start/finish() invocations around it are removed from
    all users.

    * __kernfs_remove() puts an unlinked node directly instead of chaining
    it to kernfs_addrm_cxt. Its callers are updated to grab and release
    kernfs_mutex instead of calling kernfs_addrm_start/finish() around
    it.

    v2: Updated to fit the v2 restructuring of removal path.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • kernfs_unmap_bin_file() is supposed to unmap all memory mappings of
    the target file before kernfs_remove() finishes; however, it currently
    is being called from kernfs_addrm_finish() and has the same race
    problem as the original implementation of deactivation when there are
    multiple removers - only the remover which snatches the node to its
    addrm_cxt->removed list is guaranteed to wait for its completion
    before returning.

    It can be fixed by moving kernfs_unmap_bin_file() invocation from
    kernfs_addrm_finish() to __kernfs_remove(). The function may be
    called multiple times but that shouldn't do any harm.

    We end up dropping kernfs_mutex in the removal loop and the node may
    be removed inbetween by someone else. kernfs_unlink_sibling() is
    updated to test whether the node has already been removed and return
    accordingly. __kernfs_remove() in turn performs post-unlinking
    cleanup only if it actually unlinked the node.

    KERNFS_HAS_MMAP test is moved out of the unmap function into
    __kernfs_remove() so that we don't unlock kernfs_mutex unnecessarily.
    While at it, drop the now meaningless "bin" qualifier from the
    function name.

    v2: Rewritten to fit the v2 restructuring of removal path. HAS_MMAP
    test relocated.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • The recursive nature of kernfs_remove() means that, even if
    kernfs_remove() is not allowed to be called multiple times on the same
    node, there may be race conditions between removal of parent and its
    descendants. While we can claim that kernfs_remove() shouldn't be
    called on one of the descendants while the removal of an ancestor is
    in progress, such rule is unnecessarily restrictive and very difficult
    to enforce. It's better to simply allow invoking kernfs_remove() as
    the caller sees fit as long as the caller ensures that the node is
    accessible.

    The current behavior in such situations is broken. Whoever enters
    removal path first takes the node off the hierarchy and then
    deactivates. Following removers either return as soon as it notices
    that it's not the first one or can't even find the target node as it
    has already been removed from the hierarchy. In both cases, the
    following removers may finish prematurely while the nodes which should
    be removed and drained are still being processed by the first one.

    This patch restructures so that multiple removers, whether through
    recursion or direction invocation, always follow the following rules.

    * When there are multiple concurrent removers, only one puts the base
    ref.

    * Regardless of which one puts the base ref, all removers are blocked
    until the target node is fully deactivated and removed.

    To achieve the above, removal path now first deactivates the subtree,
    drains it and then unlinks one-by-one. __kernfs_deactivate() is
    called directly from __kernfs_removal() and drops and regrabs
    kernfs_mutex for each descendant to drain active refs. As this means
    that multiple removers can enter __kernfs_deactivate() for the same
    node, the function is updated so that it can handle multiple
    deactivators of the same node - only one actually deactivates but all
    wait till drain completion.

    The restructured removal path guarantees that a removed node gets
    unlinked only after the node is deactivated and drained. Combined
    with proper multiple deactivator handling, this guarantees that any
    invocation of kernfs_remove() returns only after the node itself and
    all its descendants are deactivated, drained and removed.

    v2: Draining separated into a separate loop (used to be in the same
    loop as unlink) and done from __kernfs_deactivate(). This is to
    allow exposing deactivation as a separate interface later.

    Root node removal was broken in v1 patch. Fixed.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • KERNFS_REMOVED is used to mark half-initialized and dying nodes so
    that they don't show up in lookups and deny adding new nodes under or
    renaming it; however, its role overlaps those of deactivation and
    removal from rbtree.

    It's necessary to deny addition of new children while removal is in
    progress; however, this role considerably intersects with deactivation
    - KERNFS_REMOVED prevents new children while deactivation prevents new
    file operations. There's no reason to have them separate making
    things more complex than necessary.

    KERNFS_REMOVED is also used to decide whether a node is still visible
    to vfs layer, which is rather redundant as equivalent determination
    can be made by testing whether the node is on its parent's children
    rbtree or not.

    This patch removes KERNFS_REMOVED.

    * Instead of KERNFS_REMOVED, each node now starts its life
    deactivated. This means that we now use both atomic_add() and
    atomic_sub() on KN_DEACTIVATED_BIAS, which is INT_MIN. The compiler
    generates an overflow warnings when negating INT_MIN as the negation
    can't be represented as a positive number. Nothing is actually
    broken but let's bump BIAS by one to avoid the warnings for archs
    which negates the subtrahend..

    * KERNFS_REMOVED tests in add and rename paths are replaced with
    kernfs_get/put_active() of the target nodes. Due to the way the add
    path is structured now, active ref handling is done in the callers
    of kernfs_add_one(). This will be consolidated up later.

    * kernfs_remove_one() is updated to deactivate instead of setting
    KERNFS_REMOVED. This removes deactivation from kernfs_deactivate(),
    which is now renamed to kernfs_drain().

    * kernfs_dop_revalidate() now tests RB_EMPTY_NODE(&kn->rb) instead of
    KERNFS_REMOVED and KERNFS_REMOVED test in kernfs_dir_pos() is
    dropped. A node which is removed from the children rbtree is not
    included in the iteration in the first place. This means that a
    node may be visible through vfs a bit longer - it's now also visible
    after deactivation until the actual removal. This slightly enlarged
    window difference doesn't make any difference to the userland.

    * Sanity check on KERNFS_REMOVED in kernfs_put() is replaced with
    checks on the active ref.

    * Some comment style updates in the affected area.

    v2: Reordered before removal path restructuring. kernfs_active()
    dropped and kernfs_get/put_active() used instead. RB_EMPTY_NODE()
    used in the lookup paths.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • There currently are two mechanisms gating active ref lockdep
    annotations - KERNFS_LOCKDEP flag and KERNFS_ACTIVE_REF type mask.
    The former disables lockdep annotations in kernfs_get/put_active()
    while the latter disables all of kernfs_deactivate().

    While KERNFS_ACTIVE_REF also behaves as an optimization to skip the
    deactivation step for non-file nodes, the benefit is marginal and it
    needlessly diverges code paths. Let's drop KERNFS_ACTIVE_REF and use
    KERNFS_LOCKDEP in kernfs_deactivate() too.

    While at it, add a test helper kernfs_lockdep() to test KERNFS_LOCKDEP
    flag so that it's more convenient and the related code can be compiled
    out when not enabled.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • kernfs_node->u.completion is used to notify deactivation completion
    from kernfs_put_active() to kernfs_deactivate(). We now allow
    multiple racing removals of the same node and the current removal
    scheme is no longer correct - kernfs_remove() invocation may return
    before the node is properly deactivated if it races against another
    removal. The removal path will be restructured to address the issue.

    To help such restructure which requires supporting multiple waiters,
    this patch replaces kernfs_node->u.completion with
    kernfs_root->deactivate_waitq. This makes deactivation event
    notifications share a per-root waitqueue_head; however, the wait path
    is quite cold and this will also allow shaving one pointer off
    kernfs_node.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • When kernfs_seq_start() fails to obtain an active reference, it
    returns ERR_PTR(-ENODEV). kernfs_seq_stop() is then invoked with the
    error pointer value; however, it still proceeds to invoke
    kernfs_put_active() on the node leading to unbalanced put.

    If kernfs_seq_stop() is called even after active ref failure, it
    should skip invocation of @ops->seq_stop() and put_active.
    Unfortunately, this is a bit complicated because active ref failure
    isn't the only thing which may fail with ERR_PTR(-ENODEV).
    @ops->seq_start/next() may also fail with the error value and
    kernfs_seq_stop() doesn't have a way to tell apart those failures.

    Work it around by factoring out the active part of kernfs_seq_stop()
    into kernfs_seq_stop_active() and invoking it directly if
    @ops->seq_start/next() fail with ERR_PTR(-ENODEV) and updating
    kernfs_seq_stop() to skip kernfs_seq_stop_active() on
    ERR_PTR(-ENODEV). This is a bit nasty but ensures that the active put
    is skipped iff get_active failed in kernfs_seq_start().

    Signed-off-by: Tejun Heo
    Cc: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     

18 Dec, 2013

6 commits

  • Add support for mkdir(2), rmdir(2) and rename(2) syscalls. This is
    implemented through optional kernfs_dir_ops callback table which can
    be specified on kernfs_create_root(). An implemented callback is
    invoked when the matching syscall is invoked.

    As kernfs keep dcache syncs with internal representation and
    revalidates dentries on each access, the implementation of these
    methods is extremely simple. Each just discovers the relevant
    kernfs_node(s) and invokes the requested callback which is allowed to
    do any kernfs operations and the end result doesn't necessarily have
    to match the expected semantics of the syscall.

    This will be used to convert cgroup to use kernfs instead of its own
    filesystem implementation.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • kernfs doesn't allow negative dentries - kernfs_iop_lookup() returns
    ERR_PTR(-ENOENT) instead of NULL which short-circuits negative dentry
    creation and kernfs's d_delete() callback, kernfs_dop_delete(),
    returns 1 for all removed nodes. This in turn allows
    kernfs_dop_revalidate() to assume that there's no negative dentry for
    kernfs.

    This worked fine for sysfs but kernfs is scheduled to grow mkdir(2)
    support which depend on negative dentries. This patch updates so that
    kernfs allows negative dentries. The required changes are almost
    trivial - kernfs_iop_lookup() now returns NULL instead of
    ERR_PTR(-ENOENT) when the target kernfs_node doesn't exist,
    kernfs_dop_delete() is removed and kernfs_dop_revalidate() is updated
    to check whether the target dentry is negative and request fresh
    lookup if so.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • kernfs_rename_ns() currently assumes that the target sysfs_dirent has
    a copied name. This has been okay because sysfs supports rename only
    for directories which always have copied names; however, there's
    nothing in kernfs interface which calls for such restriction and
    currently invoking kernfs_rename_ns() on a regular file leads to oops
    because it ends up trying to kfree() a static name.

    This patch updates kernfs_rename_ns() so that it skips kfree() of the
    old name if it's static. This allows it to be used for all node
    types.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • Because sysfs used struct attribute which are supposed to stay
    constant, sysfs didn't copy names when creating regular files. The
    specified string for name was supposed to stay constant. Such
    distinction isn't inherent for kernfs. kernfs_create_file[_ns]()
    should be able to take the same @name as kernfs_create_dir[_ns]()

    As there can be huge number of sysfs attributes, we still want to be
    able to use static names for sysfs attributes. This patch renames
    kernfs_create_file_ns_key() to __kernfs_create_file() and adds
    @name_is_static parameter so that the caller can explicitly indicate
    that @name can be used without copying. kernfs is updated to use
    KERNFS_STATIC_NAME to distinguish static and copied names.

    This patch doesn't introduce any behavior changes.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • kernfs currently assumes that the caller doesn't try to create a new
    node under a removed parent, rename a removed node, or move a node
    under a removed node. While this works fine for sysfs, it'd be nice
    to have protection against such cases especially given that kernfs is
    planned to add support for mkdir, rmdir and rename requsts from
    userland which may make race conditions more likely.

    This patch updates create and rename paths to check REMOVED and fail
    the operation with -ENOENT if performed on or towards removed nodes.
    Note that remove path already has such check.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • sysfs assumed 0755 for all newly created directories and kernfs
    inherited it. This assumption is unnecessarily restrictive and
    inconsistent with kernfs_create_file[_ns](). This patch adds @mode
    parameter to kernfs_create_dir[_ns]() and update uses in sysfs
    accordingly. Among others, this will be useful for implementations of
    the planned ->mkdir() method.

    This patch doesn't introduce any behavior differences.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     

12 Dec, 2013

6 commits

  • kernfs has just been separated out from sysfs and we're already in
    full conflict mode. Nothing can make the situation any worse. Let's
    take the chance to name things properly.

    This patch performs the following renames.

    * s/sysfs_*()/kernfs_*()/ in all internal functions
    * s/sysfs/kernfs/ in internal strings, comments and whatever is remaining
    * Uniformly rename various vfs operations so that they're consistently
    named and distinguishable.

    This patch is strictly rename only and doesn't introduce any
    functional difference.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • kernfs has just been separated out from sysfs and we're already in
    full conflict mode. Nothing can make the situation any worse. Let's
    take the chance to name things properly.

    This patch performs the following renames.

    * s/sysfs_mutex/kernfs_mutex/
    * s/sysfs_dentry_ops/kernfs_dops/
    * s/sysfs_dir_operations/kernfs_dir_fops/
    * s/sysfs_dir_inode_operations/kernfs_dir_iops/
    * s/kernfs_file_operations/kernfs_file_fops/ - renamed for consistency
    * s/sysfs_symlink_inode_operations/kernfs_symlink_iops/
    * s/sysfs_aops/kernfs_aops/
    * s/sysfs_backing_dev_info/kernfs_bdi/
    * s/sysfs_inode_operations/kernfs_iops/
    * s/sysfs_dir_cachep/kernfs_node_cache/
    * s/sysfs_ops/kernfs_sops/

    This patch is strictly rename only and doesn't introduce any
    functional difference.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • kernfs has just been separated out from sysfs and we're already in
    full conflict mode. Nothing can make the situation any worse. Let's
    take the chance to name things properly.

    This patch performs the following renames.

    * s/SYSFS_DIR/KERNFS_DIR/
    * s/SYSFS_KOBJ_ATTR/KERNFS_FILE/
    * s/SYSFS_KOBJ_LINK/KERNFS_LINK/
    * s/SYSFS_{TYPE_FLAGS}/KERNFS_{TYPE_FLAGS}/
    * s/SYSFS_FLAG_{FLAG}/KERNFS_{FLAG}/
    * s/sysfs_type()/kernfs_type()/
    * s/SD_DEACTIVATED_BIAS/KN_DEACTIVATED_BIAS/

    This patch is strictly rename only and doesn't introduce any
    functional difference.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • kernfs has just been separated out from sysfs and we're already in
    full conflict mode. Nothing can make the situation any worse. Let's
    take the chance to name things properly.

    This patch performs the following renames.

    * s/sysfs_open_dirent/kernfs_open_node/
    * s/sysfs_open_file/kernfs_open_file/
    * s/sysfs_inode_attrs/kernfs_iattrs/
    * s/sysfs_addrm_cxt/kernfs_addrm_cxt/
    * s/sysfs_super_info/kernfs_super_info/
    * s/sysfs_info()/kernfs_info()/
    * s/sysfs_open_dirent_lock/kernfs_open_node_lock/
    * s/sysfs_open_file_mutex/kernfs_open_file_mutex/
    * s/sysfs_of()/kernfs_of()/

    This patch is strictly rename only and doesn't introduce any
    functional difference.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • kernfs has just been separated out from sysfs and we're already in
    full conflict mode. Nothing can make the situation any worse. Let's
    take the chance to name things properly.

    s_ prefix for kernfs members is used inconsistently and a misnomer
    now. It's not like kernfs_node is used widely across the kernel
    making the ability to grep for the members particularly useful. Let's
    just drop the prefix.

    This patch is strictly rename only and doesn't introduce any
    functional difference.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • kernfs has just been separated out from sysfs and we're already in
    full conflict mode. Nothing can make the situation any worse. Let's
    take the chance to name things properly.

    This patch performs the following renames.

    * s/sysfs_elem_dir/kernfs_elem_dir/
    * s/sysfs_elem_symlink/kernfs_elem_symlink/
    * s/sysfs_elem_attr/kernfs_elem_file/
    * s/sysfs_dirent/kernfs_node/
    * s/sd/kn/ in kernfs proper
    * s/parent_sd/parent/
    * s/target_sd/target/
    * s/dir_sd/parent/
    * s/to_sysfs_dirent()/rb_to_kn()/
    * misc renames of local vars when they conflict with the above

    Because md, mic and gpio dig into sysfs details, this patch ends up
    modifying them. All are sysfs_dirent renames and trivial. While we
    can avoid these by introducing a dummy wrapping struct sysfs_dirent
    around kernfs_node, given the limited usage outside kernfs and sysfs
    proper, I don't think such workaround is called for.

    This patch is strictly rename only and doesn't introduce any
    functional difference.

    - mic / gpio renames were missing. Spotted by kbuild test robot.

    Signed-off-by: Tejun Heo
    Cc: Neil Brown
    Cc: Linus Walleij
    Cc: Ashutosh Dixit
    Cc: kbuild test robot
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     

11 Dec, 2013

1 commit

  • This is v3.14 fix for the same issue that a8b14744429f ("sysfs: give
    different locking key to regular and bin files") addresses for v3.13.
    Due to the extensive kernfs reorganization in v3.14 branch, the same
    fix couldn't be ported as-is. The v3.13 fix was ignored while merging
    it into v3.14 branch.

    027a485d12e0 ("sysfs: use a separate locking class for open files
    depending on mmap") assigned different lockdep key to
    sysfs_open_file->mutex depending on whether the file implements mmap
    or not in an attempt to avoid spurious lockdep warning caused by
    merging of regular and bin file paths.

    While this restored some of the original behavior of using different
    locks (at least lockdep is concerned) for the different clases of
    files. The restoration wasn't full because now the lockdep key
    assignment depends on whether the file has mmap or not instead of
    whether it's a regular file or not.

    This means that bin files which don't implement mmap will get assigned
    the same lockdep class as regular files. This is problematic because
    file_operations for bin files still implements the mmap file operation
    and checking whether the sysfs file actually implements mmap happens
    in the file operation after grabbing @sysfs_open_file->mutex. We
    still end up adding locking dependency from mmap locking to
    sysfs_open_file->mutex to the regular file mutex which triggers
    spurious circular locking warning.

    For v3.13, a8b14744429f ("sysfs: give different locking key to regular
    and bin files") fixed it by giving sysfs_open_file->mutex different
    lockdep keys depending on whether the file is regular or bin instead
    of whether mmap exists or not; however, due to the way sysfs is now
    layered behind kernfs, this approach is no longer viable. kernfs can
    tell whether a sysfs node has mmap implemented or not but can't tell
    whether a bin file from a regular one.

    This patch updates kernfs such that kernfs_file_mmap() checks
    SYSFS_FLAG_HAS_MMAP and bail before grabbing sysfs_open_file->mutex so
    that it doesn't add spurious locking dependency from mmap to
    sysfs_open_file->mutex and changes sysfs so that it specifies
    kernfs_ops->mmap iff the sysfs file implements mmap. Combined, this
    ensures that sysfs_open_file->mutex is grabbed under mmap path iff the
    sysfs file actually implements mmap. As sysfs_open_file->mutex is
    already given a different lockdep key if mmap is implemented, this
    removes the spurious locking dependency.

    Signed-off-by: Tejun Heo
    Reported-by: Dave Jones
    Link: http://lkml.kernel.org/g/20131203184324.GA11320@redhat.com
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     

09 Dec, 2013

1 commit