02 Nov, 2011

1 commit

  • In sysfs_rename we need to remove the optimization of not calling
    sysfs_unlink_sibling and sysfs_link_sibling if the renamed parent
    directory is not changing. This optimization is no longer valid now
    that sysfs dirents are stored in an rbtree sorted by name.

    Move the assignment of s_ns before the call of sysfs_link_sibling. With
    no sysfs_dirent fields changing after the call of sysfs_link_sibling
    this allows sysfs_link_sibling to take any of the directory entries into
    account when it builds the rbtrees, and s_ns looks like a prime canidate
    to be used in the rbtree in the future.

    Signed-off-by: Eric W. Biederman
    Cc: Jiri Slaby
    Cc: Greg KH
    Cc: David Miller
    Cc: Mikulas Patocka
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

25 Oct, 2011

2 commits

  • In commit 8a9ea3237e7e ("Merge git://.../davem/net-next") where my sysfs
    changes from the net tree merged with the sysfs rbtree changes from
    Mickulas Patocka the conflict resolution failed to preserve the
    simplified property that was the point of my changes.

    That is sysfs_find_dirent can now say something is a match if and only
    s_name and s_ns match what we are looking for, and sysfs_readdir can
    simply return all of the directory entries where s_ns matches the
    directory that we should be returning.

    Now that we are back to exact matches we can tweak sysfs_find_dirent and
    the name rb_tree to order sysfs_dirents by s_ns s_name and remove the
    second loop in sysfs_find_dirent. However that change seems a bit much
    for a conflict resolution so it can come later.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1745 commits)
    dp83640: free packet queues on remove
    dp83640: use proper function to free transmit time stamping packets
    ipv6: Do not use routes from locally generated RAs
    |PATCH net-next] tg3: add tx_dropped counter
    be2net: don't create multiple RX/TX rings in multi channel mode
    be2net: don't create multiple TXQs in BE2
    be2net: refactor VF setup/teardown code into be_vf_setup/clear()
    be2net: add vlan/rx-mode/flow-control config to be_setup()
    net_sched: cls_flow: use skb_header_pointer()
    ipv4: avoid useless call of the function check_peer_pmtu
    TCP: remove TCP_DEBUG
    net: Fix driver name for mdio-gpio.c
    ipv4: tcp: fix TOS value in ACK messages sent from TIME_WAIT
    rtnetlink: Add missing manual netlink notification in dev_change_net_namespaces
    ipv4: fix ipsec forward performance regression
    jme: fix irq storm after suspend/resume
    route: fix ICMP redirect validation
    net: hold sock reference while processing tx timestamps
    tcp: md5: add more const attributes
    Add ethtool -g support to virtio_net
    ...

    Fix up conflicts in:
    - drivers/net/Kconfig:
    The split-up generated a trivial conflict with removal of a
    stale reference to Documentation/networking/net-modules.txt.
    Remove it from the new location instead.
    - fs/sysfs/dir.c:
    Fairly nasty conflicts with the sysfs rb-tree usage, conflicting
    with Eric Biederman's changes for tagged directories.

    Linus Torvalds
     

20 Oct, 2011

2 commits

  • sysfs is a core piece of ifrastructure that many people use and
    few people have all of the rules in their head on how to use
    it correctly. Add warnings for people using tagged directories
    improperly to that any misuses can be caught and diagnosed quickly.

    A single inexpensive test in sysfs_find_dirent is almost sufficient
    to catch all possible misuses. An additional warning is needed
    in sysfs_add_dirent so that we actually fail when attempting to
    add an untagged dirent in a tagged directory.

    Signed-off-by: Eric W. Biederman
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • Now that /sys/class/net/bonding_masters is implemented as a tagged sysfs
    file we can remove support for untagged files in tagged directories.

    This change removes any ambiguity of what a NULL namespace value
    means. A NULL namespace parameter after this patch means
    that we are talking about an untagged sysfs dirent.

    This makes the sysfs code much less prone to mistakes when during
    maintenance.

    Signed-off-by: Eric W. Biederman
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     

27 Sep, 2011

1 commit

  • "sysfs: use rb-tree for inode number lookup" added a new printk which
    causes a new compile warning on s390 (and few other architectures):

    fs/sysfs/dir.c: In function 'sysfs_link_sibling':
    fs/sysfs/dir.c:63:4: warning: format '%lx' expects argument of type
    'long unsigned int', but argument 2 has type 'ino_t' [-Wform

    Add an explicit unsigned long cast since ino_t is an unsigned long on
    most architectures.

    Cc: Mikulas Patocka
    Signed-off-by: Heiko Carstens
    Signed-off-by: Greg Kroah-Hartman

    Heiko Carstens
     

23 Aug, 2011

4 commits

  • sysfs: use rb-tree for inode number lookup

    This patch makes sysfs use red-black tree for inode number lookup.
    Together with a previous patch to use red-black tree for name lookup,
    this patch makes all sysfs lookups to have O(log n) complexity.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Greg Kroah-Hartman

    Mikulas Patocka
     
  • sysfs: remove s_sibling hacks

    s_sibling was used for three different purposes:
    1) as a linked list of entries in the directory
    2) as a linked list of entries to be deleted
    3) as a pointer to "struct completion"

    This patch removes the hack and introduces new union u which
    holds pointers for cases 2) and 3).

    This change is needed for the following patch that removes s_sibling at all
    and replaces it with a rb tree.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Greg Kroah-Hartman

    Mikulas Patocka
     
  • sysfs: use rb-tree for name lookups

    Use red-black tree for name lookups.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Greg Kroah-Hartman

    Mikulas Patocka
     
  • sysfs: count subdirectories

    This patch introduces a subdirectory counter for each sysfs directory.

    Without the patch, sysfs_refresh_inode would walk all entries of the directory
    to calculate the number of subdirectories.

    This patch improves time of "ls -la /sys/block" when there are 10000 block
    devices from 9 seconds to 0.19 seconds.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Greg Kroah-Hartman

    Mikulas Patocka
     

07 Jan, 2011

3 commits

  • Require filesystems be aware of .d_revalidate being called in rcu-walk
    mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning
    -ECHILD from all implementations.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Reduce some branches and memory accesses in dcache lookup by adding dentry
    flags to indicate common d_ops are set, rather than having to check them.
    This saves a pointer memory access (dentry->d_op) in common path lookup
    situations, and saves another pointer load and branch in cases where we
    have d_op but not the particular operation.

    Patched with:

    git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Change d_delete from a dentry deletion notification to a dentry caching
    advise, more like ->drop_inode. Require it to be constant and idempotent,
    and not take d_lock. This is how all existing filesystems use the callback
    anyway.

    This makes fine grained dentry locking of dput and dentry lru scanning
    much simpler.

    Signed-off-by: Nick Piggin

    Nick Piggin
     

22 May, 2010

3 commits

  • Add some in-line comments to explain the new infrastructure, which
    was introduced to support sysfs directory tagging with namespaces.
    I think an overall description someplace might be good too, but it
    didn't really seem to fit into Documentation/filesystems/sysfs.txt,
    which appears more geared toward users, rather than maintainers, of
    sysfs.

    (Tejun, please let me know if I can make anything clearer or failed
    altogether to comment something that should be commented.)

    Signed-off-by: Serge E. Hallyn
    Cc: Eric W. Biederman
    Cc: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Serge E. Hallyn
     
  • I had hopped to avoid this but the bonding driver adds a file
    to /sys/class/net/ and the easiest way to handle that file is
    to make it untagged and to register it only once.

    So relax the rules on tagged directories, and make bonding work.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • The problem. When implementing a network namespace I need to be able
    to have multiple network devices with the same name. Currently this
    is a problem for /sys/class/net/*, /sys/devices/virtual/net/*, and
    potentially a few other directories of the form /sys/ ... /net/*.

    What this patch does is to add an additional tag field to the
    sysfs dirent structure. For directories that should show different
    contents depending on the context such as /sys/class/net/, and
    /sys/devices/virtual/net/ this tag field is used to specify the
    context in which those directories should be visible. Effectively
    this is the same as creating multiple distinct directories with
    the same name but internally to sysfs the result is nicer.

    I am calling the concept of a single directory that looks like multiple
    directories all at the same path in the filesystem tagged directories.

    For the networking namespace the set of directories whose contents I need
    to filter with tags can depend on the presence or absence of hotplug
    hardware or which modules are currently loaded. Which means I need
    a simple race free way to setup those directories as tagged.

    To achieve a reace free design all tagged directories are created
    and managed by sysfs itself.

    Users of this interface:
    - define a type in the sysfs_tag_type enumeration.
    - call sysfs_register_ns_types with the type and it's operations
    - sysfs_exit_ns when an individual tag is no longer valid

    - Implement mount_ns() which returns the ns of the calling process
    so we can attach it to a sysfs superblock.
    - Implement ktype.namespace() which returns the ns of a syfs kobject.

    Everything else is left up to sysfs and the driver layer.

    For the network namespace mount_ns and namespace() are essentially
    one line functions, and look to remain that.

    Tags are currently represented a const void * pointers as that is
    both generic, prevides enough information for equality comparisons,
    and is trivial to create for current users, as it is just the
    existing namespace pointer.

    The work needed in sysfs is more extensive. At each directory
    or symlink creating I need to check if the directory it is being
    created in is a tagged directory and if so generate the appropriate
    tag to place on the sysfs_dirent. Likewise at each symlink or
    directory removal I need to check if the sysfs directory it is
    being removed from is a tagged directory and if so figure out
    which tag goes along with the name I am deleting.

    Currently only directories which hold kobjects, and
    symlinks are supported. There is not enough information
    in the current file attribute interfaces to give us anything
    to discriminate on which makes it useless, and there are
    no potential users which makes it an uninteresting problem
    to solve.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Benjamin Thery
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     

08 Mar, 2010

4 commits

  • Currently sysfs_get_inode magically returns an inode on
    sysfs_sb. Make the super_block parameter explicit and
    the code becomes clearer.

    Acked-by: Tejun Heo
    Acked-by: Serge Hallyn
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • If we exclude directories and symlinks from the set of sysfs
    dirents where we need active references we are left with
    sysfs attributes (binary or not).

    - Tweak sysfs_deactivate to only do something on attributes
    - Move lockdep initialization into sysfs_file_add_mode to
    limit it to just attributes.

    Signed-off-by: Eric W. Biederman
    Acked-by: WANG Cong
    Cc: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • It turns out that holding an active reference on a directory is
    pointless. The purpose of the active references are to allows us to
    block when removing sysfs entries that have custom methods so we don't
    remove modules while running modular code and to keep those custom
    methods from accessing data structures after the files have been
    removed. Further sysfs_remove_dir remove all elements in the
    directory before removing the directory itself, so there is no chance
    we will remove a directory with active children.

    Signed-off-by: Eric W. Biederman
    Cc: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • When sysfs_readdir stops short we now cache the next
    sysfs_dirent to return to user space in filp->private_data.
    There is no impact on the rest of sysfs by doing this and
    in the common case it allows us to pick up exactly where
    we left off with no seeking.

    Additionally I drop and regrab the sysfs_mutex around
    filldir to avoid a page fault abritrarily increasing the
    hold time on the sysfs_mutex.

    v2: Returned to using INT_MAX as the EOF condition.
    seekdir is ambiguous unless all directory entries have
    a unique f_pos value.

    Fixes http://bugzilla.kernel.org/show_bug.cgi?id=14949

    Signed-off-by: Eric W. Biederman
    Cc: Linus Torvalds
    Cc: stable
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     

05 Jan, 2010

1 commit

  • Holding locks over device_del -> kobject_del -> sysfs_deactivate can
    cause deadlocks if those same locks are grabbed in sysfs show or store
    methods.

    The I model s_active count + completion as a sleeping read/write lock.
    I describe to lockdep sysfs_get_active as a read_trylock,
    sysfs_put_active as a read_unlock, and sysfs_deactivate as a
    write_lock and write_unlock pair. This seems to capture the essence
    for purposes of finding deadlocks, and in my testing gives finds real
    issues and ignores non-issues.

    This brings us back to holding locks over kobject_del is a problem
    that ideally we should find a way of addressing, but at least lockdep
    can tell us about the problems instead of requiring developers to debug
    rare strange system deadlocks, that happen when sysfs files are removed
    while being written to.

    Signed-off-by: Eric W. Biederman
    Acked-by: Tejun Heo
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

12 Dec, 2009

7 commits

  • These two functions do 90% of the same work and it doesn't significantly
    obfuscate the function to allow both the parent dir and the name to change
    at the same time. So merge them together to simplify maintenance, and
    increase testing.

    Acked-by: Tejun Heo
    Acked-by: Serge Hallyn
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • By teaching sysfs_revalidate to hide a dentry for
    a sysfs_dirent if the sysfs_dirent has been renamed,
    and by teaching sysfs_lookup to return the original
    dentry if the sysfs dirent has been renamed. I can
    show the results of renames correctly without having to
    update the dcache during the directory rename.

    This massively simplifies the rename logic allowing a lot
    of weird sysfs special cases to be removed along with
    a lot of now unnecesary helper code.

    Acked-by: Tejun Heo
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • With lazy inode updates and dentry operations bringing everything
    into sync on demand there is no longer any need to immediately
    update the vfs or grab i_mutex to protect those updates as we
    make changes to sysfs.

    Acked-by: Serge Hallyn
    Acked-by: Tejun Heo
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • With the implementation of sysfs_getattr and sysfs_permission
    sysfs becomes able to lazily propogate inode attribute changes
    from the sysfs_dirents to the vfs inodes. This paves the way
    for deleting significant chunks of now unnecessary code.

    While doing this we did not reference sysfs_setattr from
    sysfs_symlink_inode_operations so I added along with
    sysfs_getattr and sysfs_permission.

    Acked-by: Tejun Heo
    Acked-by: Serge Hallyn
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • Currently sysfs updates the timestamps on the vfs directory
    inode when we create or remove a directory entry but doesn't
    update the cached copy on the sysfs_dirent, fix that oversight.

    Acked-by: Tejun Heo
    Acked-by: Serge Hallyn
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • Calling d_drop unconditionally when a sysfs_dirent is deleted has
    the potential to leak mounts, so instead implement dentry delete
    and revalidate operations that cause sysfs dentries to be removed
    at the appropriate time.

    Acked-by: Tejun Heo
    Acked-by: Serge Hallyn
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • Using dentry instead of d in the function name is what
    several other filesystems are doing and it seems to be
    a more readable convention.

    Acked-by: Tejun Heo
    Acked-by: Serge Hallyn
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     

05 Nov, 2009

1 commit


15 Oct, 2009

1 commit


10 Sep, 2009

1 commit

  • This patch adds a setxattr handler to the file, directory, and symlink
    inode_operations structures for sysfs. The patch uses hooks introduced in the
    previous patch to handle the getting and setting of security information for
    the sysfs inodes. As was suggested by Eric Biederman the struct iattr in the
    sysfs_dirent structure has been replaced by a structure which contains the
    iattr, secdata and secdata length to allow the changes to persist in the event
    that the inode representing the sysfs_dirent is evicted. Because sysfs only
    stores this information when a change is made all the optional data is moved
    into one dynamically allocated field.

    This patch addresses an issue where SELinux was denying virtd access to the PCI
    configuration entries in sysfs. The lack of setxattr handlers for sysfs
    required that a single label be assigned to all entries in sysfs. Granting virtd
    access to every entry in sysfs is not an acceptable solution so fine grained
    labeling of sysfs is required such that individual entries can be labeled
    appropriately.

    [sds: Fixed compile-time warnings, coding style, and setting of inode security init flags.]

    Signed-off-by: David P. Quigley
    Signed-off-by: Stephen D. Smalley
    Signed-off-by: James Morris

    David P. Quigley
     

29 Jul, 2009

1 commit

  • Update directory hardlink count when moving kobjects to a new parent.
    Fixes the following problem which occurs when several devices are
    moved to the same parent and then unregistered:

    > ls -laF /sys/devices/css0/defunct/
    > total 0
    > drwxr-xr-x 4294967295 root root 0 2009-07-14 17:02 ./
    > drwxr-xr-x 114 root root 0 2009-07-14 17:02 ../
    > drwxr-xr-x 2 root root 0 2009-07-14 17:01 power/
    > -rw-r--r-- 1 root root 4096 2009-07-14 17:01 uevent

    Signed-off-by: Peter Oberparleiter
    Cc: stable
    Signed-off-by: Greg Kroah-Hartman

    Peter Oberparleiter
     

28 Mar, 2009

2 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (37 commits)
    fs: avoid I_NEW inodes
    Merge code for single and multiple-instance mounts
    Remove get_init_pts_sb()
    Move common mknod_ptmx() calls into caller
    Parse mount options just once and copy them to super block
    Unroll essentials of do_remount_sb() into devpts
    vfs: simple_set_mnt() should return void
    fs: move bdev code out of buffer.c
    constify dentry_operations: rest
    constify dentry_operations: configfs
    constify dentry_operations: sysfs
    constify dentry_operations: JFS
    constify dentry_operations: OCFS2
    constify dentry_operations: GFS2
    constify dentry_operations: FAT
    constify dentry_operations: FUSE
    constify dentry_operations: procfs
    constify dentry_operations: ecryptfs
    constify dentry_operations: CIFS
    constify dentry_operations: AFS
    ...

    Linus Torvalds
     
  • Signed-off-by: Al Viro

    Al Viro
     

25 Mar, 2009

2 commits

  • Modify sysfs bin files so that we can remove the bin file while they are
    still mapped. When the kobject is removed we unmap the bin file and
    arrange for future accesses to the mapping to receive SIGBUS.

    Implementing this prevents a nasty DOS when pci devices are hot plugged
    and unplugged. Where if any of their resources were mmaped the kernel
    could not free up their pci resources or release their pci data
    structures.

    [akpm@linux-foundation.org: remove unused var]
    Signed-off-by: Eric W. Biederman
    Cc: Jesse Barnes
    Acked-by: Tejun Heo
    Cc: Kay Sievers
    Signed-off-by: Andrew Morton
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • sysfs: sysfs_add_one WARNs with full path to duplicate filename

    As a debugging aid, it can be useful to know the full path to a
    duplicate file being created in sysfs.

    We now will display warnings such as:

    sysfs: cannot create duplicate filename '/foo'

    when attempting to create multiple files named 'foo' in the sysfs
    root, or:

    sysfs: cannot create duplicate filename '/bus/pci/slots/5/foo'

    when attempting to create multiple files named 'foo' under a
    given directory in sysfs.

    The path displayed is always a relative path to sysfs_root. The
    leading '/' in the path name refers to the sysfs_root mount
    point, and should not be confused with the "real" '/'.

    Thanks to Alex Williamson for essentially writing sysfs_pathname.

    Cc: Alex Williamson
    Signed-off-by: Alex Chiang
    Signed-off-by: Greg Kroah-Hartman

    Alex Chiang
     

23 Oct, 2008

1 commit


17 Oct, 2008

3 commits

  • It finally dawned on me what the clean fix to sysfs_rename_dir
    calling kobject_set_name is. Move the work into kobject_rename
    where it belongs. The callers serialize us anyway so this is
    safe.

    Signed-off-by: Eric W. Biederman
    Acked-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • As inode creation is protected by sysfs_mutex, ilookup5_nowait()
    always either fails to find at all or finds one which is fully
    initialized, so using ilookup5_nowait() or ilookup5() doesn't make any
    difference. Switch to ilookup5() as it's planned to be removed. This
    change also makes lookup return value handling a bit simpler.

    This change was suggested by Al Viro.

    Signed-off-by: Tejun Heo
    Cc: Al Viro
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • Support sysfs_notify from atomic context with new sysfs_notify_dirent

    sysfs_notify currently takes sysfs_mutex.
    This means that it cannot be called in atomic context.
    sysfs_mutex is sometimes held over a malloc (sysfs_rename_dir)
    so it can block on low memory.

    In md I want to be able to notify on a sysfs attribute from
    atomic context, and I don't want to block on low memory because I
    could be in the writeout path for freeing memory.

    So:
    - export the "sysfs_dirent" structure along with sysfs_get, sysfs_put
    and sysfs_get_dirent so I can get the sysfs_dirent that I want to
    notify on and hold it in an md structure.
    - split sysfs_notify_dirent out of sysfs_notify so the sysfs_dirent
    can be notified on with no blocking (just a spinlock).

    Signed-off-by: Neil Brown
    Acked-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Neil Brown