13 Nov, 2019

1 commit

  • kernfs_find_and_get_node_by_ino() looks the kernfs_node matching the
    specified ino. On top of that, kernfs_get_node_by_id() and
    kernfs_fh_get_inode() implement full ID matching by testing the rest
    of ID.

    On surface, confusingly, the two are slightly different in that the
    latter uses 0 gen as wildcard while the former doesn't - does it mean
    that the latter can't uniquely identify inodes w/ 0 gen? In practice,
    this is a distinction without a difference because generation number
    starts at 1. There are no actual IDs with 0 gen, so it can always
    safely used as wildcard.

    Let's simplify the code by renaming kernfs_find_and_get_node_by_ino()
    to kernfs_find_and_get_node_by_id(), moving all lookup logics into it,
    and removing now unnecessary kernfs_get_node_by_id().

    Signed-off-by: Tejun Heo
    Reviewed-by: Greg Kroah-Hartman

    Tejun Heo
     

05 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this file is released under the gplv2

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 68 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Armijn Hemel
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190531190114.292346262@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

21 Mar, 2019

2 commits

  • Replace the special handling of security xattrs with simple_xattrs, as
    is already done for the trusted xattrs. This simplifies the code and
    allows LSMs to use more than just a single xattr to do their business.

    Signed-off-by: Ondrej Mosnacek
    Acked-by: Casey Schaufler
    [PM: manual merge fixes]
    Signed-off-by: Paul Moore

    Ondrej Mosnacek
     
  • Right now, kernfs_iattrs embeds the whole struct iattr, even though it
    doesn't really use half of its fields... This both leads to wasting
    space and makes the code look awkward. Let's just list the few fields
    we need directly in struct kernfs_iattrs.

    Signed-off-by: Ondrej Mosnacek
    Acked-by: Casey Schaufler
    [PM: merged a number of chunks manually due to fuzz]
    Signed-off-by: Paul Moore

    Ondrej Mosnacek
     

13 Mar, 2019

1 commit

  • Pull vfs mount infrastructure updates from Al Viro:
    "The rest of core infrastructure; no new syscalls in that pile, but the
    old parts are switched to new infrastructure. At that point
    conversions of individual filesystems can happen independently; some
    are done here (afs, cgroup, procfs, etc.), there's also a large series
    outside of that pile dealing with NFS (quite a bit of option-parsing
    stuff is getting used there - it's one of the most convoluted
    filesystems in terms of mount-related logics), but NFS bits are the
    next cycle fodder.

    It got seriously simplified since the last cycle; documentation is
    probably the weakest bit at the moment - I considered dropping the
    commit introducing Documentation/filesystems/mount_api.txt (cutting
    the size increase by quarter ;-), but decided that it would be better
    to fix it up after -rc1 instead.

    That pile allows to do followup work in independent branches, which
    should make life much easier for the next cycle. fs/super.c size
    increase is unpleasant; there's a followup series that allows to
    shrink it considerably, but I decided to leave that until the next
    cycle"

    * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (41 commits)
    afs: Use fs_context to pass parameters over automount
    afs: Add fs_context support
    vfs: Add some logging to the core users of the fs_context log
    vfs: Implement logging through fs_context
    vfs: Provide documentation for new mount API
    vfs: Remove kern_mount_data()
    hugetlbfs: Convert to fs_context
    cpuset: Use fs_context
    kernfs, sysfs, cgroup, intel_rdt: Support fs_context
    cgroup: store a reference to cgroup_ns into cgroup_fs_context
    cgroup1_get_tree(): separate "get cgroup_root to use" into a separate helper
    cgroup_do_mount(): massage calling conventions
    cgroup: stash cgroup_root reference into cgroup_fs_context
    cgroup2: switch to option-by-option parsing
    cgroup1: switch to option-by-option parsing
    cgroup: take options parsing into ->parse_monolithic()
    cgroup: fold cgroup1_mount() into cgroup1_get_tree()
    cgroup: start switching to fs_context
    ipc: Convert mqueue fs to fs_context
    proc: Add fs_context support to procfs
    ...

    Linus Torvalds
     

28 Feb, 2019

1 commit

  • Make kernfs support superblock creation/mount/remount with fs_context.

    This requires that sysfs, cgroup and intel_rdt, which are built on kernfs,
    be made to support fs_context also.

    Notes:

    (1) A kernfs_fs_context struct is created to wrap fs_context and the
    kernfs mount parameters are moved in here (or are in fs_context).

    (2) kernfs_mount{,_ns}() are made into kernfs_get_tree(). The extra
    namespace tag parameter is passed in the context if desired

    (3) kernfs_free_fs_context() is provided as a destructor for the
    kernfs_fs_context struct, but for the moment it does nothing except
    get called in the right places.

    (4) sysfs doesn't wrap kernfs_fs_context since it has no parameters to
    pass, but possibly this should be done anyway in case someone wants to
    add a parameter in future.

    (5) A cgroup_fs_context struct is created to wrap kernfs_fs_context and
    the cgroup v1 and v2 mount parameters are all moved there.

    (6) cgroup1 parameter parsing error messages are now handled by invalf(),
    which allows userspace to collect them directly.

    (7) cgroup1 parameter cleanup is now done in the context destructor rather
    than in the mount/get_tree and remount functions.

    Weirdies:

    (*) cgroup_do_get_tree() calls cset_cgroup_from_root() with locks held,
    but then uses the resulting pointer after dropping the locks. I'm
    told this is okay and needs commenting.

    (*) The cgroup refcount web. This really needs documenting.

    (*) cgroup2 only has one root?

    Add a suggestion from Thomas Gleixner in which the RDT enablement code is
    placed into its own function.

    [folded a leak fix from Andrey Vagin]

    Signed-off-by: David Howells
    cc: Greg Kroah-Hartman
    cc: Tejun Heo
    cc: Li Zefan
    cc: Johannes Weiner
    cc: cgroups@vger.kernel.org
    cc: fenghua.yu@intel.com
    Signed-off-by: Al Viro

    David Howells
     

08 Feb, 2019

1 commit

  • Creating a new cache for kernfs_iattrs.
    Currently, memory is allocated with kzalloc() which
    always gives aligned memory. On ARM, this is 64 byte aligned.
    To avoid the wastage of memory in aligning the size requested,
    a new cache for kernfs_iattrs is created.

    Size of struct kernfs_iattrs is 80 Bytes.
    On ARM, it will come in kmalloc-128 slab.
    and it will come in kmalloc-192 slab if debug info is enabled.
    Extra bytes taken 48 bytes.

    Total number of objects created : 4096
    Total saving = 48*4096 = 192 KB

    After creating new slab(When debug info is enabled) :
    sh-3.2# cat /proc/slabinfo
    ...
    kernfs_iattrs_cache 4069 4096 128 32 1 : tunables 0 0 0 : slabdata 128 128 0
    ...

    All testing has been done on ARM target.

    Signed-off-by: Ayush Mittal
    Signed-off-by: Vaneet Narang
    Acked-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Ayush Mittal
     

21 Jul, 2018

1 commit

  • This change allows creating kernfs files and directories with arbitrary
    uid/gid instead of always using GLOBAL_ROOT_UID/GID by extending
    kernfs_create_dir_ns() and kernfs_create_file_ns() with uid/gid arguments.
    The "simple" kernfs_create_file() and kernfs_create_dir() are left alone
    and always create objects belonging to the global root.

    When creating symlinks ownership (uid/gid) is taken from the target kernfs
    object.

    Co-Developed-by: Tyler Hicks
    Signed-off-by: Dmitry Torokhov
    Signed-off-by: Tyler Hicks
    Signed-off-by: David S. Miller

    Dmitry Torokhov
     

29 Jul, 2017

2 commits

  • When working on adding exportfs operations in kernfs, I found it's hard
    to initialize dentry->d_fsdata in the exportfs operations. Looks there
    is no way to do it without race condition. Look at the kernfs code
    closely, there is no point to set dentry->d_fsdata. inode->i_private
    already points to kernfs_node, and we can get inode from a dentry. So
    this patch just delete the d_fsdata usage.

    Acked-by: Tejun Heo
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: Shaohua Li
    Signed-off-by: Jens Axboe

    Shaohua Li
     
  • Add an API to get kernfs node from inode number. We will need this to
    implement exportfs operations.

    This API will be used in blktrace too later, so it should be as fast as
    possible. To make the API lock free, kernfs node is freed in RCU
    context. And we depend on kernfs_node count/ino number to filter out
    stale kernfs nodes.

    Acked-by: Tejun Heo
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: Shaohua Li
    Signed-off-by: Jens Axboe

    Shaohua Li
     

03 Mar, 2017

1 commit

  • Add a system call to make extended file information available, including
    file creation and some attribute flags where available through the
    underlying filesystem.

    The getattr inode operation is altered to take two additional arguments: a
    u32 request_mask and an unsigned int flags that indicate the
    synchronisation mode. This change is propagated to the vfs_getattr*()
    function.

    Functions like vfs_stat() are now inline wrappers around new functions
    vfs_statx() and vfs_statx_fd() to reduce stack usage.

    ========
    OVERVIEW
    ========

    The idea was initially proposed as a set of xattrs that could be retrieved
    with getxattr(), but the general preference proved to be for a new syscall
    with an extended stat structure.

    A number of requests were gathered for features to be included. The
    following have been included:

    (1) Make the fields a consistent size on all arches and make them large.

    (2) Spare space, request flags and information flags are provided for
    future expansion.

    (3) Better support for the y2038 problem [Arnd Bergmann] (tv_sec is an
    __s64).

    (4) Creation time: The SMB protocol carries the creation time, which could
    be exported by Samba, which will in turn help CIFS make use of
    FS-Cache as that can be used for coherency data (stx_btime).

    This is also specified in NFSv4 as a recommended attribute and could
    be exported by NFSD [Steve French].

    (5) Lightweight stat: Ask for just those details of interest, and allow a
    netfs (such as NFS) to approximate anything not of interest, possibly
    without going to the server [Trond Myklebust, Ulrich Drepper, Andreas
    Dilger] (AT_STATX_DONT_SYNC).

    (6) Heavyweight stat: Force a netfs to go to the server, even if it thinks
    its cached attributes are up to date [Trond Myklebust]
    (AT_STATX_FORCE_SYNC).

    And the following have been left out for future extension:

    (7) Data version number: Could be used by userspace NFS servers [Aneesh
    Kumar].

    Can also be used to modify fill_post_wcc() in NFSD which retrieves
    i_version directly, but has just called vfs_getattr(). It could get
    it from the kstat struct if it used vfs_xgetattr() instead.

    (There's disagreement on the exact semantics of a single field, since
    not all filesystems do this the same way).

    (8) BSD stat compatibility: Including more fields from the BSD stat such
    as creation time (st_btime) and inode generation number (st_gen)
    [Jeremy Allison, Bernd Schubert].

    (9) Inode generation number: Useful for FUSE and userspace NFS servers
    [Bernd Schubert].

    (This was asked for but later deemed unnecessary with the
    open-by-handle capability available and caused disagreement as to
    whether it's a security hole or not).

    (10) Extra coherency data may be useful in making backups [Andreas Dilger].

    (No particular data were offered, but things like last backup
    timestamp, the data version number and the DOS archive bit would come
    into this category).

    (11) Allow the filesystem to indicate what it can/cannot provide: A
    filesystem can now say it doesn't support a standard stat feature if
    that isn't available, so if, for instance, inode numbers or UIDs don't
    exist or are fabricated locally...

    (This requires a separate system call - I have an fsinfo() call idea
    for this).

    (12) Store a 16-byte volume ID in the superblock that can be returned in
    struct xstat [Steve French].

    (Deferred to fsinfo).

    (13) Include granularity fields in the time data to indicate the
    granularity of each of the times (NFSv4 time_delta) [Steve French].

    (Deferred to fsinfo).

    (14) FS_IOC_GETFLAGS value. These could be translated to BSD's st_flags.
    Note that the Linux IOC flags are a mess and filesystems such as Ext4
    define flags that aren't in linux/fs.h, so translation in the kernel
    may be a necessity (or, possibly, we provide the filesystem type too).

    (Some attributes are made available in stx_attributes, but the general
    feeling was that the IOC flags were to ext[234]-specific and shouldn't
    be exposed through statx this way).

    (15) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer,
    Michael Kerrisk].

    (Deferred, probably to fsinfo. Finding out if there's an ACL or
    seclabal might require extra filesystem operations).

    (16) Femtosecond-resolution timestamps [Dave Chinner].

    (A __reserved field has been left in the statx_timestamp struct for
    this - if there proves to be a need).

    (17) A set multiple attributes syscall to go with this.

    ===============
    NEW SYSTEM CALL
    ===============

    The new system call is:

    int ret = statx(int dfd,
    const char *filename,
    unsigned int flags,
    unsigned int mask,
    struct statx *buffer);

    The dfd, filename and flags parameters indicate the file to query, in a
    similar way to fstatat(). There is no equivalent of lstat() as that can be
    emulated with statx() by passing AT_SYMLINK_NOFOLLOW in flags. There is
    also no equivalent of fstat() as that can be emulated by passing a NULL
    filename to statx() with the fd of interest in dfd.

    Whether or not statx() synchronises the attributes with the backing store
    can be controlled by OR'ing a value into the flags argument (this typically
    only affects network filesystems):

    (1) AT_STATX_SYNC_AS_STAT tells statx() to behave as stat() does in this
    respect.

    (2) AT_STATX_FORCE_SYNC will require a network filesystem to synchronise
    its attributes with the server - which might require data writeback to
    occur to get the timestamps correct.

    (3) AT_STATX_DONT_SYNC will suppress synchronisation with the server in a
    network filesystem. The resulting values should be considered
    approximate.

    mask is a bitmask indicating the fields in struct statx that are of
    interest to the caller. The user should set this to STATX_BASIC_STATS to
    get the basic set returned by stat(). It should be noted that asking for
    more information may entail extra I/O operations.

    buffer points to the destination for the data. This must be 256 bytes in
    size.

    ======================
    MAIN ATTRIBUTES RECORD
    ======================

    The following structures are defined in which to return the main attribute
    set:

    struct statx_timestamp {
    __s64 tv_sec;
    __s32 tv_nsec;
    __s32 __reserved;
    };

    struct statx {
    __u32 stx_mask;
    __u32 stx_blksize;
    __u64 stx_attributes;
    __u32 stx_nlink;
    __u32 stx_uid;
    __u32 stx_gid;
    __u16 stx_mode;
    __u16 __spare0[1];
    __u64 stx_ino;
    __u64 stx_size;
    __u64 stx_blocks;
    __u64 __spare1[1];
    struct statx_timestamp stx_atime;
    struct statx_timestamp stx_btime;
    struct statx_timestamp stx_ctime;
    struct statx_timestamp stx_mtime;
    __u32 stx_rdev_major;
    __u32 stx_rdev_minor;
    __u32 stx_dev_major;
    __u32 stx_dev_minor;
    __u64 __spare2[14];
    };

    The defined bits in request_mask and stx_mask are:

    STATX_TYPE Want/got stx_mode & S_IFMT
    STATX_MODE Want/got stx_mode & ~S_IFMT
    STATX_NLINK Want/got stx_nlink
    STATX_UID Want/got stx_uid
    STATX_GID Want/got stx_gid
    STATX_ATIME Want/got stx_atime{,_ns}
    STATX_MTIME Want/got stx_mtime{,_ns}
    STATX_CTIME Want/got stx_ctime{,_ns}
    STATX_INO Want/got stx_ino
    STATX_SIZE Want/got stx_size
    STATX_BLOCKS Want/got stx_blocks
    STATX_BASIC_STATS [The stuff in the normal stat struct]
    STATX_BTIME Want/got stx_btime{,_ns}
    STATX_ALL [All currently available stuff]

    stx_btime is the file creation time, stx_mask is a bitmask indicating the
    data provided and __spares*[] are where as-yet undefined fields can be
    placed.

    Time fields are structures with separate seconds and nanoseconds fields
    plus a reserved field in case we want to add even finer resolution. Note
    that times will be negative if before 1970; in such a case, the nanosecond
    fields will also be negative if not zero.

    The bits defined in the stx_attributes field convey information about a
    file, how it is accessed, where it is and what it does. The following
    attributes map to FS_*_FL flags and are the same numerical value:

    STATX_ATTR_COMPRESSED File is compressed by the fs
    STATX_ATTR_IMMUTABLE File is marked immutable
    STATX_ATTR_APPEND File is append-only
    STATX_ATTR_NODUMP File is not to be dumped
    STATX_ATTR_ENCRYPTED File requires key to decrypt in fs

    Within the kernel, the supported flags are listed by:

    KSTAT_ATTR_FS_IOC_FLAGS

    [Are any other IOC flags of sufficient general interest to be exposed
    through this interface?]

    New flags include:

    STATX_ATTR_AUTOMOUNT Object is an automount trigger

    These are for the use of GUI tools that might want to mark files specially,
    depending on what they are.

    Fields in struct statx come in a number of classes:

    (0) stx_dev_*, stx_blksize.

    These are local system information and are always available.

    (1) stx_mode, stx_nlinks, stx_uid, stx_gid, stx_[amc]time, stx_ino,
    stx_size, stx_blocks.

    These will be returned whether the caller asks for them or not. The
    corresponding bits in stx_mask will be set to indicate whether they
    actually have valid values.

    If the caller didn't ask for them, then they may be approximated. For
    example, NFS won't waste any time updating them from the server,
    unless as a byproduct of updating something requested.

    If the values don't actually exist for the underlying object (such as
    UID or GID on a DOS file), then the bit won't be set in the stx_mask,
    even if the caller asked for the value. In such a case, the returned
    value will be a fabrication.

    Note that there are instances where the type might not be valid, for
    instance Windows reparse points.

    (2) stx_rdev_*.

    This will be set only if stx_mode indicates we're looking at a
    blockdev or a chardev, otherwise will be 0.

    (3) stx_btime.

    Similar to (1), except this will be set to 0 if it doesn't exist.

    =======
    TESTING
    =======

    The following test program can be used to test the statx system call:

    samples/statx/test-statx.c

    Just compile and run, passing it paths to the files you want to examine.
    The file is built automatically if CONFIG_SAMPLES is enabled.

    Here's some example output. Firstly, an NFS directory that crosses to
    another FSID. Note that the AUTOMOUNT attribute is set because transiting
    this directory will cause d_automount to be invoked by the VFS.

    [root@andromeda ~]# /tmp/test-statx -A /warthog/data
    statx(/warthog/data) = 0
    results=7ff
    Size: 4096 Blocks: 8 IO Block: 1048576 directory
    Device: 00:26 Inode: 1703937 Links: 125
    Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041
    Access: 2016-11-24 09:02:12.219699527+0000
    Modify: 2016-11-17 10:44:36.225653653+0000
    Change: 2016-11-17 10:44:36.225653653+0000
    Attributes: 0000000000001000 (-------- -------- -------- -------- -------- -------- ---m---- --------)

    Secondly, the result of automounting on that directory.

    [root@andromeda ~]# /tmp/test-statx /warthog/data
    statx(/warthog/data) = 0
    results=7ff
    Size: 4096 Blocks: 8 IO Block: 1048576 directory
    Device: 00:27 Inode: 2 Links: 125
    Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041
    Access: 2016-11-24 09:02:12.219699527+0000
    Modify: 2016-11-17 10:44:36.225653653+0000
    Change: 2016-11-17 10:44:36.225653653+0000

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     

28 Dec, 2016

1 commit

  • Add ->open/release() methods to kernfs_ops. ->open() is called when
    the file is opened and ->release() when the file is either released or
    severed. These callbacks can be used, for example, to manage
    persistent caching objects over multiple seq_file iterations.

    Signed-off-by: Tejun Heo
    Acked-by: Greg Kroah-Hartman
    Acked-by: Acked-by: Zefan Li

    Tejun Heo
     

07 Oct, 2016

1 commit


28 May, 2016

1 commit

  • smack ->d_instantiate() uses ->setxattr(), so to be able to call it before
    we'd hashed the new dentry and attached it to inode, we need ->setxattr()
    instances getting the inode as an explicit argument rather than obtaining
    it from dentry.

    Similar change for ->getxattr() had been done in commit ce23e64. Unlike
    ->getxattr() (which is used by both selinux and smack instances of
    ->d_instantiate()) ->setxattr() is used only by smack one and unfortunately
    it got missed back then.

    Reported-by: Seung-Woo Kim
    Tested-by: Casey Schaufler
    Signed-off-by: Al Viro

    Al Viro
     

11 Apr, 2016

1 commit


19 Jun, 2015

1 commit

  • Move kernfs_get_inode() prototype from fs/kernfs/kernfs-internal.h to
    include/linux/kernfs.h. It obtains the matching inode for a
    kernfs_node.

    It will be used by cgroup for inode based permission checks for now
    but is generally useful.

    Signed-off-by: Tejun Heo
    Acked-by: Greg Kroah-Hartman

    Tejun Heo
     

21 Jan, 2015

1 commit


26 Apr, 2014

1 commit

  • Currently, there's no way to find out which super_blocks are
    associated with a given kernfs_root. Let's implement it - the planned
    inotify extension to kernfs_notify() needs it.

    Make kernfs_super_info point back to the super_block and chain it at
    kernfs_root->supers.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     

15 Feb, 2014

1 commit

  • Currently kernfs_node_from_dentry() returns NULL for root dentry,
    because root_dentry->d_op == NULL.

    Due to this bug cgroupstats_build() returns -EINVAL for root cgroup.

    # mount -t cgroup -o cpuacct /cgroup
    # Documentation/accounting/getdelays -C /cgroup
    fatal reply error, errno -22

    With this fix:

    # Documentation/accounting/getdelays -C /cgroup
    sleeping 305, blocked 0, running 1, stopped 0, uninterruptible 1

    Signed-off-by: Li Zefan
    Acked-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Li Zefan
     

08 Feb, 2014

2 commits

  • KERNFS_REMOVED is used to mark half-initialized and dying nodes so
    that they don't show up in lookups and deny adding new nodes under or
    renaming it; however, its role overlaps that of deactivation.

    It's necessary to deny addition of new children while removal is in
    progress; however, this role considerably intersects with deactivation
    - KERNFS_REMOVED prevents new children while deactivation prevents new
    file operations. There's no reason to have them separate making
    things more complex than necessary.

    This patch removes KERNFS_REMOVED.

    * Instead of KERNFS_REMOVED, each node now starts its life
    deactivated. This means that we now use both atomic_add() and
    atomic_sub() on KN_DEACTIVATED_BIAS, which is INT_MIN. The compiler
    generates an overflow warnings when negating INT_MIN as the negation
    can't be represented as a positive number. Nothing is actually
    broken but let's bump BIAS by one to avoid the warnings for archs
    which negates the subtrahend..

    * A new helper kernfs_active() which tests whether kn->active >= 0 is
    added for convenience and lockdep annotation. All KERNFS_REMOVED
    tests are replaced with negated kernfs_active() tests.

    * __kernfs_remove() is updated to deactivate, but not drain, all nodes
    in the subtree instead of setting KERNFS_REMOVED. This removes
    deactivation from kernfs_deactivate(), which is now renamed to
    kernfs_drain().

    * Sanity check on KERNFS_REMOVED in kernfs_put() is replaced with
    checks on the active ref.

    * Some comment style updates in the affected area.

    v2: Reordered before removal path restructuring. kernfs_active()
    dropped and kernfs_get/put_active() used instead. RB_EMPTY_NODE()
    used in the lookup paths.

    v3: Reverted most of v2 except for creating a new node with
    KN_DEACTIVATED_BIAS.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • kernfs_addrm_cxt and the accompanying kernfs_addrm_start/finish() were
    added because there were operations which should be performed outside
    kernfs_mutex after adding and removing kernfs_nodes. The necessary
    operations were recorded in kernfs_addrm_cxt and performed by
    kernfs_addrm_finish(); however, after the recent changes which
    relocated deactivation and unmapping so that they're performed
    directly during removal, the only operation kernfs_addrm_finish()
    performs is kernfs_put(), which can be moved inside the removal path
    too.

    This patch moves the kernfs_put() of the base ref to __kernfs_remove()
    and remove kernfs_addrm_cxt and kernfs_addrm_start/finish().

    * kernfs_add_one() is updated to grab and release kernfs_mutex itself.
    sysfs_addrm_start/finish() invocations around it are removed from
    all users.

    * __kernfs_remove() puts an unlinked node directly instead of chaining
    it to kernfs_addrm_cxt. Its callers are updated to grab and release
    kernfs_mutex instead of calling kernfs_addrm_start/finish() around
    it.

    v2: Rebased on top of "kernfs: associate a new kernfs_node with its
    parent on creation" which dropped @parent from kernfs_add_one().

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     

18 Jan, 2014

1 commit

  • Once created, a kernfs_node is always destroyed by kernfs_put().
    Since ba7443bc656e ("sysfs, kernfs: implement
    kernfs_create/destroy_root()"), kernfs_put() depends on kernfs_root()
    to locate the ino_ida. kernfs_root() in turn depends on
    kernfs_node->parent being set for !dir nodes. This means that
    kernfs_put() of a !dir node requires its ->parent to be initialized.

    This leads to oops when a newly created !dir node is destroyed without
    going through kernfs_add_one() or after failing kernfs_add_one()
    before ->parent is set. kernfs_root() invoked from kernfs_put() will
    try to dereference NULL parent.

    Fix it by moving parent association to kernfs_new_node() from
    kernfs_add_one(). kernfs_new_node() now takes @parent instead of
    @root and determines the root from the parent and also sets the new
    node's parent properly. @parent parameter is removed from
    kernfs_add_one(). As there's no parent when creating the root node,
    __kernfs_new_node() which takes @root as before and doesn't set the
    parent is used in that case.

    This ensures that a kernfs_node in any stage in its life has its
    parent associated and thus can be put.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     

14 Jan, 2014

3 commits

  • This reverts commit ae34372eb8408b3d07e870f1939f99007a730d28.

    Tejun writes:
    I'm sorry but can you please revert the whole series?
    get_active() waiting while a node is deactivated has potential
    to lead to deadlock and that deactivate/reactivate interface is
    something fundamentally flawed and that cgroup will have to work
    with the remove_self() like everybody else. IOW, I think the
    first posting was correct.

    Cc: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • This reverts commit f601f9a2bf7dc1f7ee18feece4c4e2fc6845d6c4.

    Tejun writes:
    I'm sorry but can you please revert the whole series?
    get_active() waiting while a node is deactivated has potential
    to lead to deadlock and that deactivate/reactivate interface is
    something fundamentally flawed and that cgroup will have to work
    with the remove_self() like everybody else. IOW, I think the
    first posting was correct.

    Cc: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • This reverts commit 99177a34110889a8f2c36420c34e3bcc9bfd8a70.

    Tejun writes:
    I'm sorry but can you please revert the whole series?
    get_active() waiting while a node is deactivated has potential
    to lead to deadlock and that deactivate/reactivate interface is
    something fundamentally flawed and that cgroup will have to work
    with the remove_self() like everybody else. IOW, I think the
    first posting was correct.

    Cc: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

11 Jan, 2014

3 commits

  • kernfs_addrm_cxt and the accompanying kernfs_addrm_start/finish() were
    added because there were operations which should be performed outside
    kernfs_mutex after adding and removing kernfs_nodes. The necessary
    operations were recorded in kernfs_addrm_cxt and performed by
    kernfs_addrm_finish(); however, after the recent changes which
    relocated deactivation and unmapping so that they're performed
    directly during removal, the only operation kernfs_addrm_finish()
    performs is kernfs_put(), which can be moved inside the removal path
    too.

    This patch moves the kernfs_put() of the base ref to __kernfs_remove()
    and remove kernfs_addrm_cxt and kernfs_addrm_start/finish().

    * kernfs_add_one() is updated to grab and release the parent's active
    ref and kernfs_mutex itself. kernfs_get/put_active() and
    kernfs_addrm_start/finish() invocations around it are removed from
    all users.

    * __kernfs_remove() puts an unlinked node directly instead of chaining
    it to kernfs_addrm_cxt. Its callers are updated to grab and release
    kernfs_mutex instead of calling kernfs_addrm_start/finish() around
    it.

    v2: Updated to fit the v2 restructuring of removal path.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • kernfs_unmap_bin_file() is supposed to unmap all memory mappings of
    the target file before kernfs_remove() finishes; however, it currently
    is being called from kernfs_addrm_finish() and has the same race
    problem as the original implementation of deactivation when there are
    multiple removers - only the remover which snatches the node to its
    addrm_cxt->removed list is guaranteed to wait for its completion
    before returning.

    It can be fixed by moving kernfs_unmap_bin_file() invocation from
    kernfs_addrm_finish() to __kernfs_remove(). The function may be
    called multiple times but that shouldn't do any harm.

    We end up dropping kernfs_mutex in the removal loop and the node may
    be removed inbetween by someone else. kernfs_unlink_sibling() is
    updated to test whether the node has already been removed and return
    accordingly. __kernfs_remove() in turn performs post-unlinking
    cleanup only if it actually unlinked the node.

    KERNFS_HAS_MMAP test is moved out of the unmap function into
    __kernfs_remove() so that we don't unlock kernfs_mutex unnecessarily.
    While at it, drop the now meaningless "bin" qualifier from the
    function name.

    v2: Rewritten to fit the v2 restructuring of removal path. HAS_MMAP
    test relocated.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • KERNFS_REMOVED is used to mark half-initialized and dying nodes so
    that they don't show up in lookups and deny adding new nodes under or
    renaming it; however, its role overlaps those of deactivation and
    removal from rbtree.

    It's necessary to deny addition of new children while removal is in
    progress; however, this role considerably intersects with deactivation
    - KERNFS_REMOVED prevents new children while deactivation prevents new
    file operations. There's no reason to have them separate making
    things more complex than necessary.

    KERNFS_REMOVED is also used to decide whether a node is still visible
    to vfs layer, which is rather redundant as equivalent determination
    can be made by testing whether the node is on its parent's children
    rbtree or not.

    This patch removes KERNFS_REMOVED.

    * Instead of KERNFS_REMOVED, each node now starts its life
    deactivated. This means that we now use both atomic_add() and
    atomic_sub() on KN_DEACTIVATED_BIAS, which is INT_MIN. The compiler
    generates an overflow warnings when negating INT_MIN as the negation
    can't be represented as a positive number. Nothing is actually
    broken but let's bump BIAS by one to avoid the warnings for archs
    which negates the subtrahend..

    * KERNFS_REMOVED tests in add and rename paths are replaced with
    kernfs_get/put_active() of the target nodes. Due to the way the add
    path is structured now, active ref handling is done in the callers
    of kernfs_add_one(). This will be consolidated up later.

    * kernfs_remove_one() is updated to deactivate instead of setting
    KERNFS_REMOVED. This removes deactivation from kernfs_deactivate(),
    which is now renamed to kernfs_drain().

    * kernfs_dop_revalidate() now tests RB_EMPTY_NODE(&kn->rb) instead of
    KERNFS_REMOVED and KERNFS_REMOVED test in kernfs_dir_pos() is
    dropped. A node which is removed from the children rbtree is not
    included in the iteration in the first place. This means that a
    node may be visible through vfs a bit longer - it's now also visible
    after deactivation until the actual removal. This slightly enlarged
    window difference doesn't make any difference to the userland.

    * Sanity check on KERNFS_REMOVED in kernfs_put() is replaced with
    checks on the active ref.

    * Some comment style updates in the affected area.

    v2: Reordered before removal path restructuring. kernfs_active()
    dropped and kernfs_get/put_active() used instead. RB_EMPTY_NODE()
    used in the lookup paths.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     

18 Dec, 2013

1 commit

  • Because sysfs used struct attribute which are supposed to stay
    constant, sysfs didn't copy names when creating regular files. The
    specified string for name was supposed to stay constant. Such
    distinction isn't inherent for kernfs. kernfs_create_file[_ns]()
    should be able to take the same @name as kernfs_create_dir[_ns]()

    As there can be huge number of sysfs attributes, we still want to be
    able to use static names for sysfs attributes. This patch renames
    kernfs_create_file_ns_key() to __kernfs_create_file() and adds
    @name_is_static parameter so that the caller can explicitly indicate
    that @name can be used without copying. kernfs is updated to use
    KERNFS_STATIC_NAME to distinguish static and copied names.

    This patch doesn't introduce any behavior changes.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     

12 Dec, 2013

6 commits

  • kernfs has just been separated out from sysfs and we're already in
    full conflict mode. Nothing can make the situation any worse. Let's
    take the chance to name things properly.

    This patch performs the following renames.

    * s/sysfs_*()/kernfs_*()/ in all internal functions
    * s/sysfs/kernfs/ in internal strings, comments and whatever is remaining
    * Uniformly rename various vfs operations so that they're consistently
    named and distinguishable.

    This patch is strictly rename only and doesn't introduce any
    functional difference.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • kernfs has just been separated out from sysfs and we're already in
    full conflict mode. Nothing can make the situation any worse. Let's
    take the chance to name things properly.

    This patch performs the following renames.

    * s/sysfs_mutex/kernfs_mutex/
    * s/sysfs_dentry_ops/kernfs_dops/
    * s/sysfs_dir_operations/kernfs_dir_fops/
    * s/sysfs_dir_inode_operations/kernfs_dir_iops/
    * s/kernfs_file_operations/kernfs_file_fops/ - renamed for consistency
    * s/sysfs_symlink_inode_operations/kernfs_symlink_iops/
    * s/sysfs_aops/kernfs_aops/
    * s/sysfs_backing_dev_info/kernfs_bdi/
    * s/sysfs_inode_operations/kernfs_iops/
    * s/sysfs_dir_cachep/kernfs_node_cache/
    * s/sysfs_ops/kernfs_sops/

    This patch is strictly rename only and doesn't introduce any
    functional difference.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • kernfs has just been separated out from sysfs and we're already in
    full conflict mode. Nothing can make the situation any worse. Let's
    take the chance to name things properly.

    This patch performs the following renames.

    * s/SYSFS_DIR/KERNFS_DIR/
    * s/SYSFS_KOBJ_ATTR/KERNFS_FILE/
    * s/SYSFS_KOBJ_LINK/KERNFS_LINK/
    * s/SYSFS_{TYPE_FLAGS}/KERNFS_{TYPE_FLAGS}/
    * s/SYSFS_FLAG_{FLAG}/KERNFS_{FLAG}/
    * s/sysfs_type()/kernfs_type()/
    * s/SD_DEACTIVATED_BIAS/KN_DEACTIVATED_BIAS/

    This patch is strictly rename only and doesn't introduce any
    functional difference.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • kernfs has just been separated out from sysfs and we're already in
    full conflict mode. Nothing can make the situation any worse. Let's
    take the chance to name things properly.

    This patch performs the following renames.

    * s/sysfs_open_dirent/kernfs_open_node/
    * s/sysfs_open_file/kernfs_open_file/
    * s/sysfs_inode_attrs/kernfs_iattrs/
    * s/sysfs_addrm_cxt/kernfs_addrm_cxt/
    * s/sysfs_super_info/kernfs_super_info/
    * s/sysfs_info()/kernfs_info()/
    * s/sysfs_open_dirent_lock/kernfs_open_node_lock/
    * s/sysfs_open_file_mutex/kernfs_open_file_mutex/
    * s/sysfs_of()/kernfs_of()/

    This patch is strictly rename only and doesn't introduce any
    functional difference.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • kernfs has just been separated out from sysfs and we're already in
    full conflict mode. Nothing can make the situation any worse. Let's
    take the chance to name things properly.

    s_ prefix for kernfs members is used inconsistently and a misnomer
    now. It's not like kernfs_node is used widely across the kernel
    making the ability to grep for the members particularly useful. Let's
    just drop the prefix.

    This patch is strictly rename only and doesn't introduce any
    functional difference.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • kernfs has just been separated out from sysfs and we're already in
    full conflict mode. Nothing can make the situation any worse. Let's
    take the chance to name things properly.

    This patch performs the following renames.

    * s/sysfs_elem_dir/kernfs_elem_dir/
    * s/sysfs_elem_symlink/kernfs_elem_symlink/
    * s/sysfs_elem_attr/kernfs_elem_file/
    * s/sysfs_dirent/kernfs_node/
    * s/sd/kn/ in kernfs proper
    * s/parent_sd/parent/
    * s/target_sd/target/
    * s/dir_sd/parent/
    * s/to_sysfs_dirent()/rb_to_kn()/
    * misc renames of local vars when they conflict with the above

    Because md, mic and gpio dig into sysfs details, this patch ends up
    modifying them. All are sysfs_dirent renames and trivial. While we
    can avoid these by introducing a dummy wrapping struct sysfs_dirent
    around kernfs_node, given the limited usage outside kernfs and sysfs
    proper, I don't think such workaround is called for.

    This patch is strictly rename only and doesn't introduce any
    functional difference.

    - mic / gpio renames were missing. Spotted by kbuild test robot.

    Signed-off-by: Tejun Heo
    Cc: Neil Brown
    Cc: Linus Walleij
    Cc: Ashutosh Dixit
    Cc: kbuild test robot
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     

04 Dec, 2013

2 commits

  • kernfs inherited "security.*" xattr support from sysfs. This patch
    extends xattr support to "trusted.*" using simple_xattr_*(). As
    trusted xattrs are restricted to CAP_SYS_ADMIN, simple_xattr_*() which
    uses kernel memory for storage shouldn't be problematic.

    Note that the existing "security.*" support doesn't implement
    get/remove/list and the this patch only implements those ops for
    "trusted.*". We probably want to extend those ops to include support
    for "security.*".

    This patch will allow using kernfs from cgroup which requires
    "trusted.*" xattr support.

    Signed-off-by: Tejun Heo
    Cc: David P. Quigley
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • sysfs_init_inode_attrs() is a bit clumsy to use requiring the caller
    to check whether @sd->s_iattr is already set or not. Rename it to
    sysfs_inode_attrs(), update it to check whether @sd->s_iattr is
    already initialized before trying to initialize it and return
    @sd->s_iattr. This simplifies the callers.

    While at it,

    * Rename struct sysfs_inode_attrs pointer variables to "attrs". As
    kernfs no longer deals with "struct attribute", this isn't confusing
    and makes it easier to distinguish from struct iattr pointers.

    * A new field will be added to sysfs_inode_attrs. Reindent in
    preparation.

    This patch doesn't introduce any behavior changes.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     

30 Nov, 2013

3 commits

  • fs/kernfs/kernfs-internal.h needed to include fs/sysfs/sysfs.h because
    part of kernfs core implementation was living in sysfs.

    fs/sysfs/sysfs.h needed to include fs/kernfs/kernfs-internal.h because
    include/linux/kernfs.h didn't expose enough interface.

    The separation is complete and neither is true anymore. Remove the
    cross inclusion and make sysfs a proper user of kernfs.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • sysfs_dirent includes some information which should be available to
    kernfs users - the type, flags, name and parent pointer. This patch
    moves sysfs_dirent definition from kernfs/kernfs-internal.h to
    include/linux/kernfs.h so that kernfs users can access them.

    The type part of flags is exported as enum kernfs_node_type, the flags
    kernfs_node_flag, sysfs_type() and kernfs_enable_ns() are moved to
    include/linux/kernfs.h and the former is updated to return the enum
    type. sysfs_dirent->s_parent and ->s_name are marked explicitly as
    public.

    This patch doesn't introduce any functional changes.

    v2: Flags exported too and kernfs_enable_ns() definition moved.

    v3: While moving kernfs_enable_ns() to include/linux/kernfs.h, v1 and
    v2 put the definition outside CONFIG_SYSFS replacing the dummy
    implementation with the actual implementation too. Unfortunately,
    this can lead to oops when !CONFIG_SYSFS because
    kernfs_enable_ns() may be called on a NULL @sd and now tries to
    dereference @sd instead of not doing anything. This issue was
    reported by Yuanhan Liu.

    Signed-off-by: Tejun Heo
    Reported-by: Yuanhan Liu
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • Move core mount code to fs/kernfs/mount.c. The respective
    declarations in fs/sysfs/sysfs.h are moved to
    fs/kernfs/kernfs-internal.h.

    This is pure relocation.

    Signed-off-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo