18 Dec, 2012

1 commit

  • Pull user namespace changes from Eric Biederman:
    "While small this set of changes is very significant with respect to
    containers in general and user namespaces in particular. The user
    space interface is now complete.

    This set of changes adds support for unprivileged users to create user
    namespaces and as a user namespace root to create other namespaces.
    The tyranny of supporting suid root preventing unprivileged users from
    using cool new kernel features is broken.

    This set of changes completes the work on setns, adding support for
    the pid, user, mount namespaces.

    This set of changes includes a bunch of basic pid namespace
    cleanups/simplifications. Of particular significance is the rework of
    the pid namespace cleanup so it no longer requires sending out
    tendrils into all kinds of unexpected cleanup paths for operation. At
    least one case of broken error handling is fixed by this cleanup.

    The files under /proc//ns/ have been converted from regular files
    to magic symlinks which prevents incorrect caching by the VFS,
    ensuring the files always refer to the namespace the process is
    currently using and ensuring that the ptrace_mayaccess permission
    checks are always applied.

    The files under /proc//ns/ have been given stable inode numbers
    so it is now possible to see if different processes share the same
    namespaces.

    Through the David Miller's net tree are changes to relax many of the
    permission checks in the networking stack to allowing the user
    namespace root to usefully use the networking stack. Similar changes
    for the mount namespace and the pid namespace are coming through my
    tree.

    Two small changes to add user namespace support were commited here adn
    in David Miller's -net tree so that I could complete the work on the
    /proc//ns/ files in this tree.

    Work remains to make it safe to build user namespaces and 9p, afs,
    ceph, cifs, coda, gfs2, ncpfs, nfs, nfsd, ocfs2, and xfs so the
    Kconfig guard remains in place preventing that user namespaces from
    being built when any of those filesystems are enabled.

    Future design work remains to allow root users outside of the initial
    user namespace to mount more than just /proc and /sys."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (38 commits)
    proc: Usable inode numbers for the namespace file descriptors.
    proc: Fix the namespace inode permission checks.
    proc: Generalize proc inode allocation
    userns: Allow unprivilged mounts of proc and sysfs
    userns: For /proc/self/{uid,gid}_map derive the lower userns from the struct file
    procfs: Print task uids and gids in the userns that opened the proc file
    userns: Implement unshare of the user namespace
    userns: Implent proc namespace operations
    userns: Kill task_user_ns
    userns: Make create_new_namespaces take a user_ns parameter
    userns: Allow unprivileged use of setns.
    userns: Allow unprivileged users to create new namespaces
    userns: Allow setting a userns mapping to your current uid.
    userns: Allow chown and setgid preservation
    userns: Allow unprivileged users to create user namespaces.
    userns: Ignore suid and sgid on binaries if the uid or gid can not be mapped
    userns: fix return value on mntns_install() failure
    vfs: Allow unprivileged manipulation of the mount namespace.
    vfs: Only support slave subtrees across different user namespaces
    vfs: Add a user namespace reference from struct mnt_namespace
    ...

    Linus Torvalds
     

27 Nov, 2012

1 commit


20 Nov, 2012

1 commit


25 Oct, 2012

1 commit

  • The warning check for duplicate sysfs entries can cause a buffer overflow
    when printing the warning, as strcat() doesn't check buffer sizes.
    Use strlcat() instead.

    Since strlcat() doesn't return a pointer to the passed buffer, unlike
    strcat(), I had to convert the nested concatenation in sysfs_add_one() to
    an admittedly more obscure comma operator construct, to avoid emitting code
    for the concatenation if CONFIG_BUG is disabled.

    Signed-off-by: Geert Uytterhoeven
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman

    Geert Uytterhoeven
     

05 Sep, 2012

1 commit


02 Aug, 2012

1 commit

  • Pull second vfs pile from Al Viro:
    "The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the
    deadlock reproduced by xfstests 068), symlink and hardlink restriction
    patches, plus assorted cleanups and fixes.

    Note that another fsfreeze deadlock (emergency thaw one) is *not*
    dealt with - the series by Fernando conflicts a lot with Jan's, breaks
    userland ABI (FIFREEZE semantics gets changed) and trades the deadlock
    for massive vfsmount leak; this is going to be handled next cycle.
    There probably will be another pull request, but that stuff won't be
    in it."

    Fix up trivial conflicts due to unrelated changes next to each other in
    drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c}

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
    delousing target_core_file a bit
    Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
    fs: Remove old freezing mechanism
    ext2: Implement freezing
    btrfs: Convert to new freezing mechanism
    nilfs2: Convert to new freezing mechanism
    ntfs: Convert to new freezing mechanism
    fuse: Convert to new freezing mechanism
    gfs2: Convert to new freezing mechanism
    ocfs2: Convert to new freezing mechanism
    xfs: Convert to new freezing code
    ext4: Convert to new freezing mechanism
    fs: Protect write paths by sb_start_write - sb_end_write
    fs: Skip atime update on frozen filesystem
    fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
    fs: Improve filesystem freezing handling
    switch the protection of percpu_counter list to spinlock
    nfsd: Push mnt_want_write() outside of i_mutex
    btrfs: Push mnt_want_write() outside of i_mutex
    fat: Push mnt_want_write() outside of i_mutex
    ...

    Linus Torvalds
     

31 Jul, 2012

1 commit


27 Jul, 2012

1 commit

  • Pull driver core changes from Greg Kroah-Hartman:
    "Here's the big driver core pull request for 3.6-rc1.

    Unlike 3.5, this kernel should be a lot tamer, with the printk changes
    now settled down. All we have here is some extcon driver updates, w1
    driver updates, a few printk cleanups that weren't needed for 3.5, but
    are good to have now, and some other minor fixes/changes in the driver
    core.

    All of these have been in the linux-next releases for a while now.

    Signed-off-by: Greg Kroah-Hartman "

    * tag 'driver-core-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (38 commits)
    printk: Export struct log size and member offsets through vmcoreinfo
    Drivers: hv: Change the hex constant to a decimal constant
    driver core: don't trigger uevent after failure
    extcon: MAX77693: Add extcon-max77693 driver to support Maxim MAX77693 MUIC device
    sysfs: fail dentry revalidation after namespace change fix
    sysfs: fail dentry revalidation after namespace change
    extcon: spelling of detach in function doc
    extcon: arizona: Stop microphone detection if we give up on it
    extcon: arizona: Update cable reporting calls and split headset
    PM / Runtime: Do not increment device usage counts before probing
    kmsg - do not flush partial lines when the console is busy
    kmsg - export "continuation record" flag to /dev/kmsg
    kmsg - avoid warning for CONFIG_PRINTK=n compilations
    kmsg - properly print over-long continuation lines
    driver-core: Use kobj_to_dev instead of re-implementing it
    driver-core: Move kobj_to_dev from genhd.h to device.h
    driver core: Move deferred devices to the end of dpm_list before probing
    driver core: move uevent call to driver_register
    driver core: fix shutdown races with probe/remove(v3)
    Extcon: Arizona: Add driver for Wolfson Arizona class devices
    ...

    Linus Torvalds
     

18 Jul, 2012

2 commits

  • don't assume that KOBJ_NS_TYPE_NONE==0. Also save a test-n-branch.

    Cc: Eric W. Biederman
    Cc: Glauber Costa
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Acked-by: Serge E. Hallyn
    Signed-off-by: Greg Kroah-Hartman

    Andrew Morton
     
  • When we change the namespace tag of a sysfs entry, the associated dentry
    is still kept around. readdir() will work correctly and not display the
    old entries, but open() will still succeed, so will reads and writes.

    This will no longer happen if sysfs is remounted, hinting that this is a
    cache-related problem.

    I am using the following sequence to demonstrate that:

    shell1:
    ip link add type veth
    unshare -nm

    shell2:
    ip link set veth1
    cat /sys/devices/virtual/net/veth1/ifindex

    Before that patch, this will succeed (fail to fail). After it, it will
    correctly return an error. Differently from a normal rename, which we
    handle fine, changing the object namespace will keep it's path intact.
    So this check seems necessary as well.

    [ v2: get type from parent, as suggested by Eric Biederman ]

    Signed-off-by: Glauber Costa
    CC: Tejun Heo
    Reviewed-by: "Eric W. Biederman"
    Signed-off-by: Greg Kroah-Hartman

    Glauber Costa
     

14 Jul, 2012

5 commits


29 May, 2012

1 commit

  • Pull writeback tree from Wu Fengguang:
    "Mainly from Jan Kara to avoid iput() in the flusher threads."

    * tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
    writeback: Avoid iput() from flusher thread
    vfs: Rename end_writeback() to clear_inode()
    vfs: Move waiting for inode writeback from end_writeback() to evict_inode()
    writeback: Refactor writeback_single_inode()
    writeback: Remove wb->list_lock from writeback_single_inode()
    writeback: Separate inode requeueing after writeback
    writeback: Move I_DIRTY_PAGES handling
    writeback: Move requeueing when I_SYNC set to writeback_sb_inodes()
    writeback: Move clearing of I_SYNC into inode_sync_complete()
    writeback: initialize global_dirty_limit
    fs: remove 8 bytes of padding from struct writeback_control on 64 bit builds
    mm: page-writeback.c: local functions should not be exposed globally

    Linus Torvalds
     

24 May, 2012

1 commit

  • Pull user namespace enhancements from Eric Biederman:
    "This is a course correction for the user namespace, so that we can
    reach an inexpensive, maintainable, and reasonably complete
    implementation.

    Highlights:
    - Config guards make it impossible to enable the user namespace and
    code that has not been converted to be user namespace safe.

    - Use of the new kuid_t type ensures the if you somehow get past the
    config guards the kernel will encounter type errors if you enable
    user namespaces and attempt to compile in code whose permission
    checks have not been updated to be user namespace safe.

    - All uids from child user namespaces are mapped into the initial
    user namespace before they are processed. Removing the need to add
    an additional check to see if the user namespace of the compared
    uids remains the same.

    - With the user namespaces compiled out the performance is as good or
    better than it is today.

    - For most operations absolutely nothing changes performance or
    operationally with the user namespace enabled.

    - The worst case performance I could come up with was timing 1
    billion cache cold stat operations with the user namespace code
    enabled. This went from 156s to 164s on my laptop (or 156ns to
    164ns per stat operation).

    - (uid_t)-1 and (gid_t)-1 are reserved as an internal error value.
    Most uid/gid setting system calls treat these value specially
    anyway so attempting to use -1 as a uid would likely cause
    entertaining failures in userspace.

    - If setuid is called with a uid that can not be mapped setuid fails.
    I have looked at sendmail, login, ssh and every other program I
    could think of that would call setuid and they all check for and
    handle the case where setuid fails.

    - If stat or a similar system call is called from a context in which
    we can not map a uid we lie and return overflowuid. The LFS
    experience suggests not lying and returning an error code might be
    better, but the historical precedent with uids is different and I
    can not think of anything that would break by lying about a uid we
    can't map.

    - Capabilities are localized to the current user namespace making it
    safe to give the initial user in a user namespace all capabilities.

    My git tree covers all of the modifications needed to convert the core
    kernel and enough changes to make a system bootable to runlevel 1."

    Fix up trivial conflicts due to nearby independent changes in fs/stat.c

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
    userns: Silence silly gcc warning.
    cred: use correct cred accessor with regards to rcu read lock
    userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq
    userns: Convert cgroup permission checks to use uid_eq
    userns: Convert tmpfs to use kuid and kgid where appropriate
    userns: Convert sysfs to use kgid/kuid where appropriate
    userns: Convert sysctl permission checks to use kuid and kgids.
    userns: Convert proc to use kuid/kgid where appropriate
    userns: Convert ext4 to user kuid/kgid where appropriate
    userns: Convert ext3 to use kuid/kgid where appropriate
    userns: Convert ext2 to use kuid/kgid where appropriate.
    userns: Convert devpts to use kuid/kgid where appropriate
    userns: Convert binary formats to use kuid/kgid where appropriate
    userns: Add negative depends on entries to avoid building code that is userns unsafe
    userns: signal remove unnecessary map_cred_ns
    userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
    userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
    userns: Convert stat to return values mapped from kuids and kgids
    userns: Convert user specfied uids and gids in chown into kuids and kgid
    userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
    ...

    Linus Torvalds
     

16 May, 2012

1 commit


15 May, 2012

1 commit

  • This patch (as1554) fixes a lockdep false-positive report. The
    problem arises because lockdep is unable to deal with the
    tree-structured locks created by the device core and sysfs.

    This particular problem involves a sysfs attribute method that
    unregisters itself, not from the device it was called for, but from a
    descendant device. Lockdep doesn't understand the distinction and
    reports a possible deadlock, even though the operation is safe.

    This is the sort of thing that would normally be handled by using a
    nested lock annotation; unfortunately it's not feasible to do that
    here. There's no sensible way to tell sysfs when attribute removal
    occurs in the context of a parent attribute method.

    As a workaround, the patch adds a new flag to struct attribute
    telling sysfs not to inform lockdep when it acquires a readlock on a
    sysfs_dirent instance for the attribute. The readlock is still
    acquired, but lockdep doesn't know about it and hence does not
    complain about impossible deadlock scenarios.

    Also added are macros for static initialization of attribute
    structures with the ignore_lockdep flag set. The three offending
    attributes in the USB subsystem are converted to use the new macros.

    Signed-off-by: Alan Stern
    Acked-by: Tejun Heo
    CC: Eric W. Biederman
    CC: Peter Zijlstra
    Signed-off-by: Greg Kroah-Hartman

    Alan Stern
     

06 May, 2012

1 commit

  • After we moved inode_sync_wait() from end_writeback() it doesn't make sense
    to call the function end_writeback() anymore. Rename it to clear_inode()
    which well says what the function really does - set I_CLEAR flag.

    Signed-off-by: Jan Kara
    Signed-off-by: Fengguang Wu

    Jan Kara
     

03 May, 2012

1 commit


11 Apr, 2012

2 commits

  • In scsi at least two cases of the parent device being deleted before the
    child is added have been observed.

    1/ scsi is performing async scans and the device is removed prior to the
    async can thread running (can happen with an in-opportune / unlikely
    unplug during initial scan).

    2/ libsas discovery event running after the parent port has been torn
    down (this is a bug in libsas).

    Result in crash signatures like:
    BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
    IP: [] sysfs_create_dir+0x32/0xb6
    ...
    Process scsi_scan_8 (pid: 5417, threadinfo ffff88080bd16000, task ffff880801b8a0b0)
    Stack:
    00000000fffffffe ffff880813470628 ffff88080bd17cd0 ffff88080614b7e8
    ffff88080b45c108 00000000fffffffe ffff88080bd17d20 ffffffff8125e4a8
    ffff88080bd17cf0 ffffffff81075149 ffff88080bd17d30 ffff88080614b7e8
    Call Trace:
    [] kobject_add_internal+0x120/0x1e3
    [] ? trace_hardirqs_on+0xd/0xf
    [] kobject_add_varg+0x41/0x50
    [] kobject_add+0x64/0x66
    [] device_add+0x12d/0x63a

    In this scenario the parent is still valid (because we have a
    reference), but it has been device_del()'d which means its kobj->sd
    pointer is NULL'd via:

    device_del()->kobject_del()->sysfs_remove_dir()

    ...and then sysfs_create_dir() (without this fix) goes ahead and
    de-references parent_sd via sysfs_ns_type():

    return (sd->s_flags & SYSFS_NS_TYPE_MASK) >> SYSFS_NS_TYPE_SHIFT;

    This scenario is being fixed in scsi/libsas, but if other subsystems
    present the same ordering the system need not immediately crash.

    Cc: Greg Kroah-Hartman
    Cc: James Bottomley
    Signed-off-by: Dan Williams
    Signed-off-by: Greg Kroah-Hartman

    Dan Williams
     
  • Do not let the kernel crash when a device is registered with
    sysfs while group attributes are not set (aka NULL).

    Warn about the offender with some information about the offending
    device.

    This would warn instead of trying NULL pointer deref like:
    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [] internal_create_group+0x83/0x1a0
    PGD 0
    Oops: 0000 [#1] SMP
    CPU 0
    Modules linked in:

    Pid: 1, comm: swapper/0 Not tainted 3.4.0-rc1-x86_64 #3 HP ProLiant DL360 G4
    RIP: 0010:[] [] internal_create_group+0x83/0x1a0
    RSP: 0018:ffff88019485fd70 EFLAGS: 00010202
    RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000001
    RDX: ffff880192e99908 RSI: ffff880192e99630 RDI: ffffffff81a26c60
    RBP: ffff88019485fdc0 R08: 0000000000000000 R09: 0000000000000000
    R10: ffff880192e99908 R11: 0000000000000000 R12: ffffffff81a16a00
    R13: ffff880192e99908 R14: ffffffff81a16900 R15: 0000000000000000
    FS: 0000000000000000(0000) GS:ffff88019bc00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000000000000000 CR3: 0000000001a0c000 CR4: 00000000000007f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process swapper/0 (pid: 1, threadinfo ffff88019485e000, task ffff880194878000)
    Stack:
    ffff88019485fdd0 ffff880192da9d60 0000000000000000 ffff880192e99908
    ffff880192e995d8 0000000000000001 ffffffff81a16a00 ffff880192da9d60
    0000000000000000 0000000000000000 ffff88019485fdd0 ffffffff811527be
    Call Trace:
    [] sysfs_create_group+0xe/0x10
    [] device_add_groups+0x46/0x80
    [] device_add+0x46d/0x6a0
    ...

    Signed-off-by: Bruno Prémont
    Acked-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Bruno Prémont
     

10 Apr, 2012

1 commit


22 Mar, 2012

1 commit

  • Pull vfs pile 1 from Al Viro:
    "This is _not_ all; in particular, Miklos' and Jan's stuff is not there
    yet."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (64 commits)
    ext4: initialization of ext4_li_mtx needs to be done earlier
    debugfs-related mode_t whack-a-mole
    hfsplus: add an ioctl to bless files
    hfsplus: change finder_info to u32
    hfsplus: initialise userflags
    qnx4: new helper - try_extent()
    qnx4: get rid of qnx4_bread/qnx4_getblk
    take removal of PF_FORKNOEXEC to flush_old_exec()
    trim includes in inode.c
    um: uml_dup_mmap() relies on ->mmap_sem being held, but activate_mm() doesn't hold it
    um: embed ->stub_pages[] into mmu_context
    gadgetfs: list_for_each_safe() misuse
    ocfs2: fix leaks on failure exits in module_init
    ecryptfs: make register_filesystem() the last potential failure exit
    ntfs: forgets to unregister sysctls on register_filesystem() failure
    logfs: missing cleanup on register_filesystem() failure
    jfs: mising cleanup on register_filesystem() failure
    make configfs_pin_fs() return root dentry on success
    configfs: configfs_create_dir() has parent dentry in dentry->d_parent
    configfs: sanitize configfs_create()
    ...

    Linus Torvalds
     

21 Mar, 2012

1 commit


09 Mar, 2012

1 commit


25 Feb, 2012

1 commit

  • This patch fixies follwing two memory leak patterns that reported by kmemleak.
    sysfs_sd_setsecdata() is called during sys_lsetxattr() operation.
    It checks sd->s_iattr is NULL or not. Then if it is NULL, it calls
    sysfs_init_inode_attrs() to allocate memory.
    That code is this.

    iattrs = sd->s_iattr;
    if (!iattrs)
    iattrs = sysfs_init_inode_attrs(sd);

    The iattrs recieves sysfs_init_inode_attrs()'s result, but sd->s_iattr
    doesn't know the address. so it needs to set correct address to
    sd->s_iattr to free memory in other function.

    unreferenced object 0xffff880250b73e60 (size 32):
    comm "systemd", pid 1, jiffies 4294683888 (age 94.553s)
    hex dump (first 32 bytes):
    73 79 73 74 65 6d 5f 75 3a 6f 62 6a 65 63 74 5f system_u:object_
    72 3a 73 79 73 66 73 5f 74 3a 73 30 00 00 00 00 r:sysfs_t:s0....
    backtrace:
    [] kmemleak_alloc+0x73/0x98
    [] __kmalloc+0x100/0x12c
    [] context_struct_to_string+0x106/0x210
    [] security_sid_to_context_core+0x10b/0x129
    [] security_sid_to_context+0x10/0x12
    [] selinux_inode_getsecurity+0x7d/0xa8
    [] selinux_inode_getsecctx+0x22/0x2e
    [] security_inode_getsecctx+0x16/0x18
    [] sysfs_setxattr+0x96/0x117
    [] __vfs_setxattr_noperm+0x73/0xd9
    [] vfs_setxattr+0x83/0xa1
    [] setxattr+0xcf/0x101
    [] sys_lsetxattr+0x6a/0x8f
    [] system_call_fastpath+0x16/0x1b
    [] 0xffffffffffffffff
    unreferenced object 0xffff88024163c5a0 (size 96):
    comm "systemd", pid 1, jiffies 4294683888 (age 94.553s)
    hex dump (first 32 bytes):
    00 00 00 00 ed 41 00 00 00 00 00 00 00 00 00 00 .....A..........
    00 00 00 00 00 00 00 00 0c 64 42 4f 00 00 00 00 .........dBO....
    backtrace:
    [] kmemleak_alloc+0x73/0x98
    [] kmem_cache_alloc_trace+0xc4/0xee
    [] sysfs_init_inode_attrs+0x2a/0x83
    [] sysfs_setxattr+0xbf/0x117
    [] __vfs_setxattr_noperm+0x73/0xd9
    [] vfs_setxattr+0x83/0xa1
    [] setxattr+0xcf/0x101
    [] sys_lsetxattr+0x6a/0x8f
    [] system_call_fastpath+0x16/0x1b
    [] 0xffffffffffffffff
    `

    Signed-off-by: Masami Ichikawa
    Cc: stable
    Signed-off-by: Greg Kroah-Hartman

    Masami Ichikawa
     

03 Feb, 2012

1 commit


01 Feb, 2012

1 commit

  • This fixes a bug introduced with sysfs name hashes where renaming a
    network device appears to succeed but silently makes the sysfs files for
    that network device inaccessible.

    In at least one configuration this bug has stopped networking from
    coming up during boot.

    Signed-off-by: Eric W. Biederman
    Tested-by: Jiri Slaby
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     

25 Jan, 2012

6 commits

  • There is a misleading difference between /proc and /sys permissions, /proc is 0555 and /sys is 0755. But
    as it is impossible to create or unlink something in /sys it would be nice to have same permissions.

    Signed-off-by: Vitaly Kuznetsov
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Kuznetsov
     
  • Tracking the number of subdirectories requires an extra field that increases
    the size of sysfs_dirent. nlinks are not particularly interesting for sysfs
    and the nlink counts are wrong when network namespaces are involved so stop
    counting them, and always return nlink == 1. Userspace already knows that
    directories with nlink == 1 have an nlink count they can't use to count
    subdirectories.

    This reduces the size of sysfs_dirent by 8 bytes on 64bit platforms.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • Store the sysfs inode number in an unsided int because
    ida inode allocator can return at most a 31 bit number,
    reducing the size of struct sysfs_dirent by 8 bytes
    on 64bit platforms.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • On 32bit this reduces sizeof(struct sysfs_dirent) by 2 bytes.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • Compute a 31 bit hash of directory entries (that can fit in a signed
    32bit off_t) and index the sysfs directory entries by that hash,
    replacing the per directory indexes by name and by inode. Because we
    now only use a single rbtree this reduces the size of sysfs_dirent by 2
    pointers. Because we have fewer cases to deal with the code is now
    simpler.

    For now I use the simple hash that the dcache uses as that is easy to
    use and seems simple enough.

    In addition to makeing the code simpler using a hash for the file
    position in readdir brings sysfs in line with other filesystems that
    have non-trivial directory structures.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • Recently an OOPS was observed from the usb serial io_ti driver when it tried to remove
    sysfs directories. Upon investigation it turns out this driver was always buggy
    and that a recent sysfs change had stopped guarding itself against removing attributes
    from sysfs directories that had already been removed. :(

    Historically we have been silent about attempting to files from nonexistent sysfs
    directories and have politely returned error codes. That has resulted in people writing
    broken code that ignores the error codes.

    Issue a kernel WARNING and a stack backtrace to make it clear in no uncertain
    terms that abusing sysfs is not ok, and the callers need to fix their code.

    This change transforms the io_ti OOPS into a more comprehensible error message
    and stack backtrace.

    Signed-off-by: Eric W. Biederman
    Reported-by: Wolfgang Frisch
    Cc: stable
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     

04 Jan, 2012

3 commits


02 Nov, 2011

1 commit