08 Sep, 2013

1 commit

  • Pull namespace changes from Eric Biederman:
    "This is an assorted mishmash of small cleanups, enhancements and bug
    fixes.

    The major theme is user namespace mount restrictions. nsown_capable
    is killed as it encourages not thinking about details that need to be
    considered. A very hard to hit pid namespace exiting bug was finally
    tracked and fixed. A couple of cleanups to the basic namespace
    infrastructure.

    Finally there is an enhancement that makes per user namespace
    capabilities usable as capabilities, and an enhancement that allows
    the per userns root to nice other processes in the user namespace"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    userns: Kill nsown_capable it makes the wrong thing easy
    capabilities: allow nice if we are privileged
    pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD
    userns: Allow PR_CAPBSET_DROP in a user namespace.
    namespaces: Simplify copy_namespaces so it is clear what is going on.
    pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup
    sysfs: Restrict mounting sysfs
    userns: Better restrictions on when proc and sysfs can be mounted
    vfs: Don't copy mount bind mounts of /proc//ns/mnt between namespaces
    kernel/nsproxy.c: Improving a snippet of code.
    proc: Restrict mounting the proc filesystem
    vfs: Lock in place mounts from more privileged users

    Linus Torvalds
     

29 Aug, 2013

1 commit

  • Don't allow mounting sysfs unless the caller has CAP_SYS_ADMIN rights
    over the net namespace. The principle here is if you create or have
    capabilities over it you can mount it, otherwise you get to live with
    what other people have mounted.

    Instead of testing this with a straight forward ns_capable call,
    perform this check the long and torturous way with kobject helpers,
    this keeps direct knowledge of namespaces out of sysfs, and preserves
    the existing sysfs abstractions.

    Acked-by: Greg Kroah-Hartman
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

27 Aug, 2013

1 commit

  • Rely on the fact that another flavor of the filesystem is already
    mounted and do not rely on state in the user namespace.

    Verify that the mounted filesystem is not covered in any significant
    way. I would love to verify that the previously mounted filesystem
    has no mounts on top but there are at least the directories
    /proc/sys/fs/binfmt_misc and /sys/fs/cgroup/ that exist explicitly
    for other filesystems to mount on top of.

    Refactor the test into a function named fs_fully_visible and call that
    function from the mount routines of proc and sysfs. This makes this
    test local to the filesystems involved and the results current of when
    the mounts take place, removing a weird threading of the user
    namespace, the mount namespace and the filesystems themselves.

    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

22 Aug, 2013

1 commit


27 Mar, 2013

1 commit

  • Only allow unprivileged mounts of proc and sysfs if they are already
    mounted when the user namespace is created.

    proc and sysfs are interesting because they have content that is
    per namespace, and so fresh mounts are needed when new namespaces
    are created while at the same time proc and sysfs have content that
    is shared between every instance.

    Respect the policy of who may see the shared content of proc and sysfs
    by only allowing new mounts if there was an existing mount at the time
    the user namespace was created.

    In practice there are only two interesting cases: proc and sysfs are
    mounted at their usual places, proc and sysfs are not mounted at all
    (some form of mount namespace jail).

    Cc: stable@vger.kernel.org
    Acked-by: Serge Hallyn
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

18 Jan, 2013

1 commit


20 Nov, 2012

1 commit


14 Jul, 2012

2 commits


22 Mar, 2012

1 commit

  • Pull vfs pile 1 from Al Viro:
    "This is _not_ all; in particular, Miklos' and Jan's stuff is not there
    yet."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (64 commits)
    ext4: initialization of ext4_li_mtx needs to be done earlier
    debugfs-related mode_t whack-a-mole
    hfsplus: add an ioctl to bless files
    hfsplus: change finder_info to u32
    hfsplus: initialise userflags
    qnx4: new helper - try_extent()
    qnx4: get rid of qnx4_bread/qnx4_getblk
    take removal of PF_FORKNOEXEC to flush_old_exec()
    trim includes in inode.c
    um: uml_dup_mmap() relies on ->mmap_sem being held, but activate_mm() doesn't hold it
    um: embed ->stub_pages[] into mmu_context
    gadgetfs: list_for_each_safe() misuse
    ocfs2: fix leaks on failure exits in module_init
    ecryptfs: make register_filesystem() the last potential failure exit
    ntfs: forgets to unregister sysctls on register_filesystem() failure
    logfs: missing cleanup on register_filesystem() failure
    jfs: mising cleanup on register_filesystem() failure
    make configfs_pin_fs() return root dentry on success
    configfs: configfs_create_dir() has parent dentry in dentry->d_parent
    configfs: sanitize configfs_create()
    ...

    Linus Torvalds
     

21 Mar, 2012

1 commit


25 Jan, 2012

1 commit


13 Jun, 2011

1 commit

  • * new refcount in struct net, controlling actual freeing of the memory
    * new method in kobj_ns_type_operations (->drop_ns())
    * ->current_ns() semantics change - it's supposed to be followed by
    corresponding ->drop_ns(). For struct net in case of CONFIG_NET_NS it bumps
    the new refcount; net_drop_ns() decrements it and calls net_free() if the
    last reference has been dropped. Method renamed to ->grab_current_ns().
    * old net_free() callers call net_drop_ns() instead.
    * sysfs_exit_ns() is gone, along with a large part of callchain
    leading to it; now that the references stored in ->ns[...] stay valid we
    do not need to hunt them down and replace them with NULL. That fixes
    problems in sysfs_lookup() and sysfs_readdir(), along with getting rid
    of sb->s_instances abuse.

    Note that struct net *shutdown* logics has not changed - net_cleanup()
    is called exactly when it used to be called. The only thing postponed by
    having a sysfs instance refering to that struct net is actual freeing of
    memory occupied by struct net.

    Signed-off-by: Al Viro

    Al Viro
     

29 Oct, 2010

1 commit


10 Aug, 2010

1 commit


22 May, 2010

4 commits

  • In Al's latest vfs tree the code is reworked and S_BIAS has been removed.

    It turns out that checking to see if a super block is in the
    middle of an unmount in sysfs_exit_ns is unnecessary because we
    remove the super_block from the s_supers/s_instances list before
    struct sysfs_super_info pointed to by sb->s_fs_info is freed.

    For now just delete the unnecessary check to see if a superblock is in the
    middle of an unmount, it isn't necessary with or without Al's changes
    and it just causes a needless conflict.

    Reported-by: Stephen Rothwell
    Cc: Al Viro
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • The problem. When implementing a network namespace I need to be able
    to have multiple network devices with the same name. Currently this
    is a problem for /sys/class/net/*, /sys/devices/virtual/net/*, and
    potentially a few other directories of the form /sys/ ... /net/*.

    What this patch does is to add an additional tag field to the
    sysfs dirent structure. For directories that should show different
    contents depending on the context such as /sys/class/net/, and
    /sys/devices/virtual/net/ this tag field is used to specify the
    context in which those directories should be visible. Effectively
    this is the same as creating multiple distinct directories with
    the same name but internally to sysfs the result is nicer.

    I am calling the concept of a single directory that looks like multiple
    directories all at the same path in the filesystem tagged directories.

    For the networking namespace the set of directories whose contents I need
    to filter with tags can depend on the presence or absence of hotplug
    hardware or which modules are currently loaded. Which means I need
    a simple race free way to setup those directories as tagged.

    To achieve a reace free design all tagged directories are created
    and managed by sysfs itself.

    Users of this interface:
    - define a type in the sysfs_tag_type enumeration.
    - call sysfs_register_ns_types with the type and it's operations
    - sysfs_exit_ns when an individual tag is no longer valid

    - Implement mount_ns() which returns the ns of the calling process
    so we can attach it to a sysfs superblock.
    - Implement ktype.namespace() which returns the ns of a syfs kobject.

    Everything else is left up to sysfs and the driver layer.

    For the network namespace mount_ns and namespace() are essentially
    one line functions, and look to remain that.

    Tags are currently represented a const void * pointers as that is
    both generic, prevides enough information for equality comparisons,
    and is trivial to create for current users, as it is just the
    existing namespace pointer.

    The work needed in sysfs is more extensive. At each directory
    or symlink creating I need to check if the directory it is being
    created in is a tagged directory and if so generate the appropriate
    tag to place on the sysfs_dirent. Likewise at each symlink or
    directory removal I need to check if the sysfs directory it is
    being removed from is a tagged directory and if so figure out
    which tag goes along with the name I am deleting.

    Currently only directories which hold kobjects, and
    symlinks are supported. There is not enough information
    in the current file attribute interfaces to give us anything
    to discriminate on which makes it useless, and there are
    no potential users which makes it an uninteresting problem
    to solve.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Benjamin Thery
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • Add all of the necessary bioler plate to support
    multiple superblocks in sysfs.

    Signed-off-by: Eric W. Biederman
    Acked-by: Serge Hallyn
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

08 Mar, 2010

2 commits


25 Mar, 2009

3 commits

  • The sysfs_dirent serves as both an inode and a directory entry
    for sysfs. To prevent the sysfs inode numbers from being freed
    prematurely hold a reference to sysfs_dirent from the sysfs inode.

    [akpm@linux-foundation.org: add comment]
    Signed-off-by: Eric W. Biederman
    Cc: Tejun Heo
    Cc: Al Viro
    Cc: Cornelia Huck
    Signed-off-by: Andrew Morton
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • sysfs_get_inode ultimately calls sysfs_count_nlink when the a
    directory inode is fectched. sysfs_count_nlink needs to be
    called under the sysfs_mutex to guard against the unlikely
    but possible scenario that the root directory is changing
    as we are counting the number entries in it, and just in
    general to be consistent.

    Signed-off-by: Eric W. Biederman
    Acked-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • SYSFS_MAGIC has been added into magic.h, so only use that definition
    in magic.h to avoid potential consistency problem.

    Signed-off-by: Qinghuang Feng
    Signed-off-by: Greg Kroah-Hartman

    Qinghuang Feng
     

17 Oct, 2008

1 commit

  • Support sysfs_notify from atomic context with new sysfs_notify_dirent

    sysfs_notify currently takes sysfs_mutex.
    This means that it cannot be called in atomic context.
    sysfs_mutex is sometimes held over a malloc (sysfs_rename_dir)
    so it can block on low memory.

    In md I want to be able to notify on a sysfs attribute from
    atomic context, and I don't want to block on low memory because I
    could be in the writeout path for freeing memory.

    So:
    - export the "sysfs_dirent" structure along with sysfs_get, sysfs_put
    and sysfs_get_dirent so I can get the sysfs_dirent that I want to
    notify on and hold it in an md structure.
    - split sysfs_notify_dirent out of sysfs_notify so the sysfs_dirent
    can be notified on with no blocking (just a spinlock).

    Signed-off-by: Neil Brown
    Acked-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Neil Brown
     

30 Apr, 2008

1 commit


17 Oct, 2007

1 commit

  • provide BDI constructor/destructor hooks

    [akpm@linux-foundation.org: compile fix]
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

13 Oct, 2007

8 commits

  • Sysfs has gone through considerable amount of reimplementation. Add
    copyrights. Any objections? :-)

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • sysfs_root is different from a regular directory dirent in that it's
    of type SYSFS_ROOT and doesn't have a name. These differences aren't
    used by anybody and only adds to complexity. Make sysfs_root a
    regular directory dirent.

    Signed-off-by: Tejun Heo
    Acked-by: Cornelia Huck
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • The only uses of s_dentry left are the code that maintains
    s_dentry and trivial users that don't actually need it.
    So this patch removes the s_dentry maintenance code and
    restructures the trivial uses to use something else.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Tejun Heo
    Cc: Cornelia Huck
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • This patch modifies the users of sysfs_mount to use sysfs_root
    instead (which is what they are looking for). It then
    makes sysfs_mount static to keep people from using it
    by accident.

    The net result is slightly faster and cleaner code.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Tejun Heo
    Cc: Cornelia Huck
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • Since sysfs no longer stores fs directory information in the dcache
    on a permanent basis kill_litter_super it is inappropriate and actively
    wrong. It will decrement the count on all dentries left in the
    dcache before trying to free them.

    At the moment this is not biting us only because we never unmount sysfs.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Tejun Heo
    Cc: Cornelia Huck
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Tejun Heo
    Cc: Cornelia Huck
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • While shadow directories appear to be a good idea, the current scheme
    of controlling their creation and destruction outside of sysfs appears
    to be a locking and maintenance nightmare in the face of sysfs
    directories dynamically coming and going. Which can now occur for
    directories containing network devices when CONFIG_SYSFS_DEPRECATED is
    not set.

    This patch removes everything from the initial shadow directory support
    that allowed the shadow directory creation to be controlled at a higher
    level. So except for a few bits of sysfs_rename_dir everything from
    commit b592fcfe7f06c15ec11774b5be7ce0de3aa86e73 is now gone.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • Cleanup semaphore.h

    Signed-off-by: Dave Young
    Signed-off-by: Greg Kroah-Hartman

    Dave Young
     

20 Jul, 2007

1 commit

  • Slab destructors were no longer supported after Christoph's
    c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
    BUGs for both slab and slub, and slob never supported them
    either.

    This rips out support for the dtor pointer from kmem_cache_create()
    completely and fixes up every single callsite in the kernel (there were
    about 224, not including the slab allocator definitions themselves,
    or the documentation references).

    Signed-off-by: Paul Mundt

    Paul Mundt
     

19 Jul, 2007

1 commit

  • While making sysfs indoes hashed, sysfs root inode was left out. Now
    that nlink accounting depends on the inode being on the hash, sysfs
    root inode nlink isn't adjusted properly.

    Put sysfs root inode on the inode hash by allocating it using
    sysfs_get_inode() like other sysfs inodes. While at it, massage
    comments a bit.

    Signed-off-by: Tejun Heo
    Acked-by: Jean Delvare
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     

12 Jul, 2007

2 commits

  • This patch makes dentries and inodes for sysfs directories
    reclaimable.

    * sysfs_notify() is modified to walk sysfs_dirent tree instead of
    dentry tree.

    * sysfs_update_file() and sysfs_chmod_file() use sysfs_get_dentry() to
    grab the victim dentry.

    * sysfs_rename_dir() and sysfs_move_dir() grab all dentries using
    sysfs_get_dentry() on startup.

    * Dentries for all shadowed directories are pinned in memory to serve
    as lookup start point.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • Rename sysfs_dirent->s_type to s_flags, pack type into lower eight
    bits and reserve the rest for flags. sysfs_type() can used to access
    the type. All existing sd->s_type accesses are converted to use
    sysfs_type(). While at it, type test is changed to equality test
    instead of bit-and test where appropriate.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo