05 Jan, 2010

1 commit

  • Holding locks over device_del -> kobject_del -> sysfs_deactivate can
    cause deadlocks if those same locks are grabbed in sysfs show or store
    methods.

    The I model s_active count + completion as a sleeping read/write lock.
    I describe to lockdep sysfs_get_active as a read_trylock,
    sysfs_put_active as a read_unlock, and sysfs_deactivate as a
    write_lock and write_unlock pair. This seems to capture the essence
    for purposes of finding deadlocks, and in my testing gives finds real
    issues and ignores non-issues.

    This brings us back to holding locks over kobject_del is a problem
    that ideally we should find a way of addressing, but at least lockdep
    can tell us about the problems instead of requiring developers to debug
    rare strange system deadlocks, that happen when sysfs files are removed
    while being written to.

    Signed-off-by: Eric W. Biederman
    Acked-by: Tejun Heo
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

24 Dec, 2009

1 commit


12 Dec, 2009

15 commits

  • inode_change_ok already clears the SGID bit when necessary
    so there is no reason for sysfs_setattr to carry code to do
    the same, and it is good to kill the extra copy because when
    I moved the code last in certain corner cases the code will
    look at the wrong gid.

    Acked-by: Serge Hallyn
    Acked-by: Tejun Heo
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • These two functions do 90% of the same work and it doesn't significantly
    obfuscate the function to allow both the parent dir and the name to change
    at the same time. So merge them together to simplify maintenance, and
    increase testing.

    Acked-by: Tejun Heo
    Acked-by: Serge Hallyn
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • By teaching sysfs_revalidate to hide a dentry for
    a sysfs_dirent if the sysfs_dirent has been renamed,
    and by teaching sysfs_lookup to return the original
    dentry if the sysfs dirent has been renamed. I can
    show the results of renames correctly without having to
    update the dcache during the directory rename.

    This massively simplifies the rename logic allowing a lot
    of weird sysfs special cases to be removed along with
    a lot of now unnecesary helper code.

    Acked-by: Tejun Heo
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • With lazy inode updates and dentry operations bringing everything
    into sync on demand there is no longer any need to immediately
    update the vfs or grab i_mutex to protect those updates as we
    make changes to sysfs.

    Acked-by: Serge Hallyn
    Acked-by: Tejun Heo
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • Now that sysfs_getattr and sysfs_permission refresh the vfs
    inode there is no need to immediatly push the mode change
    into the vfs cache. Reducing the amount of work needed and
    simplifying the locking.

    Acked-by: Tejun Heo
    Acked-by: Serge Hallyn
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • With the implementation of sysfs_getattr and sysfs_permission
    sysfs becomes able to lazily propogate inode attribute changes
    from the sysfs_dirents to the vfs inodes. This paves the way
    for deleting significant chunks of now unnecessary code.

    While doing this we did not reference sysfs_setattr from
    sysfs_symlink_inode_operations so I added along with
    sysfs_getattr and sysfs_permission.

    Acked-by: Tejun Heo
    Acked-by: Serge Hallyn
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • Lining up the functions in sysfs_symlink_inode_operations
    follows the pattern in the rest of sysfs and makes things
    slightly more readable.

    Acked-by: Tejun Heo
    Acked-by: Serge Hallyn
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • Currently sysfs updates the timestamps on the vfs directory
    inode when we create or remove a directory entry but doesn't
    update the cached copy on the sysfs_dirent, fix that oversight.

    Acked-by: Tejun Heo
    Acked-by: Serge Hallyn
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • Cleanly separate the work that is specific to setting the
    attributes of a sysfs_dirent from what is needed to update
    the attributes of a vfs inode.

    Additionally grab the sysfs_mutex to keep any nasties from
    surprising us when updating the sysfs_dirent.

    Acked-by: Tejun Heo
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • The granularity of sysfs time when we keep it is 1 ns. Which
    when passed to timestamp_trunc results in a nop. So remove
    the unnecessary function call making sysfs_setattr slightly
    easier to read.

    Acked-by: Tejun Heo
    Acked-by: Serge Hallyn
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • Currently every caller of sysfs_chmod_file happens at either
    file creation time to set a non-default mode or in response
    to a specific user requested space change in policy. Making
    timestamps of when the chmod happens and notification of
    a file changing mode uninteresting.

    Remove the unnecessary time stamp and filesystem change
    notification, and removes the last of the explicit inotify
    and donitfy support from sysfs.

    Acked-by: Tejun Heo
    Acked-by: Serge Hallyn
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • Calling d_drop unconditionally when a sysfs_dirent is deleted has
    the potential to leak mounts, so instead implement dentry delete
    and revalidate operations that cause sysfs dentries to be removed
    at the appropriate time.

    Acked-by: Tejun Heo
    Acked-by: Serge Hallyn
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • Using dentry instead of d in the function name is what
    several other filesystems are doing and it seems to be
    a more readable convention.

    Acked-by: Tejun Heo
    Acked-by: Serge Hallyn
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • The sysfs_mutex is required to ensure updates are and will remain
    atomic with respect to other inode iattr updates, that do not happen
    through the filesystem.

    Acked-by: Serge Hallyn
    Acked-by: Tejun Heo
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • Signed-off-by: Stefan Richter
    Acked-by: David P. Quigley
    Signed-off-by: Greg Kroah-Hartman

    Stefan Richter
     

05 Nov, 2009

1 commit


15 Oct, 2009

2 commits

  • sysfs_notify_dirent is a simple atomic operation that can be used to
    alert user-space that new data can be read from a sysfs attribute.

    Unfortunately it cannot currently be called from non-process context
    because of its use of spin_lock which is sometimes taken with
    interrupts enabled.

    So change all lockers of sysfs_open_dirent_lock to disable interrupts,
    thus making sysfs_notify_dirent safe to be called from non-process
    context (as drivers/md does in md_safemode_timeout).

    sysfs_get_open_dirent is (documented as being) only called from
    process context, so it uses spin_lock_irq. Other places
    use spin_lock_irqsave.

    The usage for sysfs_notify_dirent in md_safemode_timeout was
    introduced in 2.6.28, so this patch is suitable for that and more
    recent kernels.

    Reported-by: Joel Andres Granados
    Signed-off-by: NeilBrown
    Signed-off-by: Dan Williams
    Cc: stable
    Signed-off-by: Greg Kroah-Hartman

    Neil Brown
     
  • As device_move() and kobject_move() both handle a NULL destination,
    sysfs_move_dir() should do this as well (again) and fall back to
    sysfs_root in that case.

    Signed-off-by: Cornelia Huck
    Cc: Phil Carmody
    Signed-off-by: Greg Kroah-Hartman

    Cornelia Huck
     

28 Sep, 2009

1 commit


12 Sep, 2009

1 commit

  • * 'writeback' of git://git.kernel.dk/linux-2.6-block:
    writeback: check for registered bdi in flusher add and inode dirty
    writeback: add name to backing_dev_info
    writeback: add some debug inode list counters to bdi stats
    writeback: get rid of pdflush completely
    writeback: switch to per-bdi threads for flushing data
    writeback: move dirty inodes from super_block to backing_dev_info
    writeback: get rid of generic_sync_sb_inodes() export

    Linus Torvalds
     

11 Sep, 2009

1 commit


10 Sep, 2009

1 commit

  • This patch adds a setxattr handler to the file, directory, and symlink
    inode_operations structures for sysfs. The patch uses hooks introduced in the
    previous patch to handle the getting and setting of security information for
    the sysfs inodes. As was suggested by Eric Biederman the struct iattr in the
    sysfs_dirent structure has been replaced by a structure which contains the
    iattr, secdata and secdata length to allow the changes to persist in the event
    that the inode representing the sysfs_dirent is evicted. Because sysfs only
    stores this information when a change is made all the optional data is moved
    into one dynamically allocated field.

    This patch addresses an issue where SELinux was denying virtd access to the PCI
    configuration entries in sysfs. The lack of setxattr handlers for sysfs
    required that a single label be assigned to all entries in sysfs. Granting virtd
    access to every entry in sysfs is not an acceptable solution so fine grained
    labeling of sysfs is required such that individual entries can be labeled
    appropriately.

    [sds: Fixed compile-time warnings, coding style, and setting of inode security init flags.]

    Signed-off-by: David P. Quigley
    Signed-off-by: Stephen D. Smalley
    Signed-off-by: James Morris

    David P. Quigley
     

29 Jul, 2009

1 commit

  • Update directory hardlink count when moving kobjects to a new parent.
    Fixes the following problem which occurs when several devices are
    moved to the same parent and then unregistered:

    > ls -laF /sys/devices/css0/defunct/
    > total 0
    > drwxr-xr-x 4294967295 root root 0 2009-07-14 17:02 ./
    > drwxr-xr-x 114 root root 0 2009-07-14 17:02 ../
    > drwxr-xr-x 2 root root 0 2009-07-14 17:01 power/
    > -rw-r--r-- 1 root root 4096 2009-07-14 17:01 uevent

    Signed-off-by: Peter Oberparleiter
    Cc: stable
    Signed-off-by: Greg Kroah-Hartman

    Peter Oberparleiter
     

09 Jul, 2009

1 commit


16 Jun, 2009

1 commit


29 May, 2009

1 commit


21 Apr, 2009

1 commit


17 Apr, 2009

2 commits

  • Currently, following test programs don't finished.

    % ruby -e '
    Thread.new { sleep }
    File.read("/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies")
    '

    strace expose the reason.

    ...
    open("/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies", O_RDONLY|O_LARGEFILE) = 3
    ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbf9fa6b8) = -1 ENOTTY (Inappropriate ioctl for device)
    fstat64(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
    _llseek(3, 0, [0], SEEK_CUR) = 0
    select(4, [3], NULL, NULL, NULL) = 1 (in [3])
    read(3, "1400000 1300000 1200000 1100000 1"..., 4096) = 62
    select(4, [3], NULL, NULL, NULL

    Because Ruby (the scripting language) VM assume select system-call
    against regular file don't block. it because SUSv3 says "Regular files
    shall always poll TRUE for reading and writing". see
    http://www.opengroup.org/onlinepubs/009695399/functions/poll.html it
    seems valid assumption.

    But sysfs_poll() don't keep this rule although sysfs file can read and
    write always.

    This patch restore proper poll behavior to sysfs.
    /sys/block/md*/md/sync_action polling application and another sysfs
    updating sensitive application still can use POLLERR and POLLPRI.

    Cc: Neil Brown
    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: Greg Kroah-Hartman

    KOSAKI Motohiro
     
  • A sysfs attribute using sysfs_schedule_callback() to commit suicide
    may end up calling device_unregister(), which will eventually call
    a driver's ->remove function.

    Drivers may call flush_scheduled_work() in their shutdown routines,
    in which case lockdep will complain with something like the following:

    =============================================
    [ INFO: possible recursive locking detected ]
    2.6.29-rc8-kk #1
    ---------------------------------------------
    events/4/56 is trying to acquire lock:
    (events){--..}, at: [] flush_workqueue+0x0/0xa0

    but task is already holding lock:
    (events){--..}, at: [] run_workqueue+0x108/0x230

    other info that might help us debug this:
    3 locks held by events/4/56:
    #0: (events){--..}, at: [] run_workqueue+0x108/0x230
    #1: (&ss->work){--..}, at: [] run_workqueue+0x108/0x230
    #2: (pci_remove_rescan_mutex){--..}, at: [] remove_callback+0x21/0x40

    stack backtrace:
    Pid: 56, comm: events/4 Not tainted 2.6.29-rc8-kk #1
    Call Trace:
    [] validate_chain+0xb7d/0x1260
    [] __lock_acquire+0x42e/0xa40
    [] lock_acquire+0x58/0x80
    [] ? flush_workqueue+0x0/0xa0
    [] flush_workqueue+0x4d/0xa0
    [] ? flush_workqueue+0x0/0xa0
    [] flush_scheduled_work+0x10/0x20
    [] e1000_remove+0x55/0xfe [e1000e]
    [] ? sysfs_schedule_callback_work+0x0/0x50
    [] pci_device_remove+0x32/0x70
    [] __device_release_driver+0x59/0x90
    [] device_release_driver+0x2b/0x40
    [] bus_remove_device+0xa6/0x120
    [] device_del+0x12b/0x190
    [] device_unregister+0x26/0x70
    [] pci_stop_dev+0x49/0x60
    [] pci_remove_bus_device+0x40/0xc0
    [] remove_callback+0x29/0x40
    [] sysfs_schedule_callback_work+0x1f/0x50
    [] run_workqueue+0x15a/0x230
    [] ? run_workqueue+0x108/0x230
    [] worker_thread+0x9f/0x100
    [] ? autoremove_wake_function+0x0/0x40
    [] ? worker_thread+0x0/0x100
    [] kthread+0x4d/0x80
    [] child_rip+0xa/0x20
    [] ? restore_args+0x0/0x30
    [] ? kthread+0x0/0x80
    [] ? child_rip+0x0/0x20

    Although we know that the device_unregister path will never acquire
    a lock that a driver might try to acquire in its ->remove, in general
    we should never attempt to flush a workqueue from within the same
    workqueue, and lockdep rightly complains.

    So as long as sysfs attributes cannot commit suicide directly and we
    are stuck with this callback mechanism, put the sysfs callbacks on
    their own workqueue instead of the global one.

    This has the side benefit that if a suicidal sysfs attribute kicks
    off a long chain of ->remove callbacks, we no longer induce a long
    delay on the global queue.

    This also fixes a missing module_put in the error path introduced
    by sysfs-only-allow-one-scheduled-removal-callback-per-kobj.patch.

    We never destroy the workqueue, but I'm not sure that's a
    problem.

    Reported-by: Kenji Kaneshige
    Tested-by: Kenji Kaneshige
    Signed-off-by: Alex Chiang
    Signed-off-by: Greg Kroah-Hartman

    Alex Chiang
     

01 Apr, 2009

1 commit

  • Fix warnings and return values in sysfs bin_page_mkwrite(), fixing
    fs/sysfs/bin.c: In function `bin_page_mkwrite':
    fs/sysfs/bin.c:250: warning: passing argument 2 of `bb->vm_ops->page_mkwrite' from incompatible pointer type
    fs/sysfs/bin.c: At top level:
    fs/sysfs/bin.c:280: warning: initialization from incompatible pointer type

    Expects to have my [PATCH next] sysfs: fix some bin_vm_ops errors

    Signed-off-by: Hugh Dickins
    Cc: Nick Piggin
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

28 Mar, 2009

2 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (37 commits)
    fs: avoid I_NEW inodes
    Merge code for single and multiple-instance mounts
    Remove get_init_pts_sb()
    Move common mknod_ptmx() calls into caller
    Parse mount options just once and copy them to super block
    Unroll essentials of do_remount_sb() into devpts
    vfs: simple_set_mnt() should return void
    fs: move bdev code out of buffer.c
    constify dentry_operations: rest
    constify dentry_operations: configfs
    constify dentry_operations: sysfs
    constify dentry_operations: JFS
    constify dentry_operations: OCFS2
    constify dentry_operations: GFS2
    constify dentry_operations: FAT
    constify dentry_operations: FUSE
    constify dentry_operations: procfs
    constify dentry_operations: ecryptfs
    constify dentry_operations: CIFS
    constify dentry_operations: AFS
    ...

    Linus Torvalds
     
  • Signed-off-by: Al Viro

    Al Viro
     

25 Mar, 2009

6 commits

  • Commit 86c9508eb1c0ce5aa07b5cf1d36b60c54efc3d7a
    "sysfs: don't block indefinitely for unmapped files" in linux-next
    crashes the PowerMac G5 when X starts up. It's caught out by the way
    powerpc's pci_mmap of legacy_mem uses shmem_zero_setup(), substituting
    a new vma->vm_file whose private_data no longer points to the bin_buffer
    (substitution done because some versions of X crash if that mmap fails).

    The fix to this is straightforward: the original vm_file is fput() in
    that case, so this mmap won't block sysfs at all, so just don't switch
    over to bin_vm_ops if vm_file has changed.

    But more fixes made before realizing that was the problem:-

    It should not be an error if bin_page_mkwrite() finds no underlying
    page_mkwrite().

    Check that a file already mmap'ed has the same underlying vm_ops
    _before_ pointing vma->vm_ops at bin_vm_ops.

    If the file being mmap'ed is a shmem/tmpfs file, don't fail the mmap
    on CONFIG_NUMA=y, just because that has a set_policy and get_policy:
    provide bin_set_policy, bin_get_policy and bin_migrate.

    Signed-off-by: Hugh Dickins
    Acked-by: Eric Biederman
    Signed-off-by: Greg Kroah-Hartman

    Hugh Dickins
     
  • The only way for a sysfs attribute to remove itself (without
    deadlock) is to use the sysfs_schedule_callback() interface.

    Vegard Nossum discovered that a poorly written sysfs ->store
    callback can repeatedly schedule remove callbacks on the same
    device over and over, e.g.

    $ while true ; do echo 1 > /sys/devices/.../remove ; done

    If the 'remove' attribute uses the sysfs_schedule_callback API
    and also does not protect itself from concurrent accesses, its
    callback handler will be called multiple times, and will
    eventually attempt to perform operations on a freed kobject,
    leading to many problems.

    Instead of requiring all callers of sysfs_schedule_callback to
    implement their own synchronization, provide the protection in
    the infrastructure.

    Now, sysfs_schedule_callback will only allow one scheduled
    callback per kobject. On subsequent calls with the same kobject,
    return -EAGAIN.

    This is a short term fix. The long term fix is to allow sysfs
    attributes to remove themselves directly, without any of this
    callback hokey pokey.

    [cornelia.huck@de.ibm.com: s390 ccwgroup bits]

    Reported-by: vegard.nossum@gmail.com
    Signed-off-by: Alex Chiang
    Acked-by: Cornelia Huck
    Signed-off-by: Greg Kroah-Hartman

    Alex Chiang
     
  • Modify sysfs bin files so that we can remove the bin file while they are
    still mapped. When the kobject is removed we unmap the bin file and
    arrange for future accesses to the mapping to receive SIGBUS.

    Implementing this prevents a nasty DOS when pci devices are hot plugged
    and unplugged. Where if any of their resources were mmaped the kernel
    could not free up their pci resources or release their pci data
    structures.

    [akpm@linux-foundation.org: remove unused var]
    Signed-off-by: Eric W. Biederman
    Cc: Jesse Barnes
    Acked-by: Tejun Heo
    Cc: Kay Sievers
    Signed-off-by: Andrew Morton
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • The sysfs_dirent serves as both an inode and a directory entry
    for sysfs. To prevent the sysfs inode numbers from being freed
    prematurely hold a reference to sysfs_dirent from the sysfs inode.

    [akpm@linux-foundation.org: add comment]
    Signed-off-by: Eric W. Biederman
    Cc: Tejun Heo
    Cc: Al Viro
    Cc: Cornelia Huck
    Signed-off-by: Andrew Morton
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • sysfs: sysfs_add_one WARNs with full path to duplicate filename

    As a debugging aid, it can be useful to know the full path to a
    duplicate file being created in sysfs.

    We now will display warnings such as:

    sysfs: cannot create duplicate filename '/foo'

    when attempting to create multiple files named 'foo' in the sysfs
    root, or:

    sysfs: cannot create duplicate filename '/bus/pci/slots/5/foo'

    when attempting to create multiple files named 'foo' under a
    given directory in sysfs.

    The path displayed is always a relative path to sysfs_root. The
    leading '/' in the path name refers to the sysfs_root mount
    point, and should not be confused with the "real" '/'.

    Thanks to Alex Williamson for essentially writing sysfs_pathname.

    Cc: Alex Williamson
    Signed-off-by: Alex Chiang
    Signed-off-by: Greg Kroah-Hartman

    Alex Chiang
     
  • sysfs_get_inode ultimately calls sysfs_count_nlink when the a
    directory inode is fectched. sysfs_count_nlink needs to be
    called under the sysfs_mutex to guard against the unlikely
    but possible scenario that the root directory is changing
    as we are counting the number entries in it, and just in
    general to be consistent.

    Signed-off-by: Eric W. Biederman
    Acked-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman