11 Oct, 2016

2 commits

  • Pull more vfs updates from Al Viro:
    ">rename2() work from Miklos + current_time() from Deepa"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: Replace current_fs_time() with current_time()
    fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
    fs: Replace CURRENT_TIME with current_time() for inode timestamps
    fs: proc: Delete inode time initializations in proc_alloc_inode()
    vfs: Add current_time() api
    vfs: add note about i_op->rename changes to porting
    fs: rename "rename2" i_op to "rename"
    vfs: remove unused i_op->rename
    fs: make remaining filesystems use .rename2
    libfs: support RENAME_NOREPLACE in simple_rename()
    fs: support RENAME_NOREPLACE for local filesystems
    ncpfs: fix unused variable warning

    Linus Torvalds
     
  • Al Viro
     

28 Sep, 2016

1 commit

  • current_fs_time() uses struct super_block* as an argument.
    As per Linus's suggestion, this is changed to take struct
    inode* as a parameter instead. This is because the function
    is primarily meant for vfs inode timestamps.
    Also the function was renamed as per Arnd's suggestion.

    Change all calls to current_fs_time() to use the new
    current_time() function instead. current_fs_time() will be
    deleted.

    Signed-off-by: Deepa Dinamani
    Signed-off-by: Al Viro

    Deepa Dinamani
     

27 Sep, 2016

2 commits

  • The result was being ignored and 0 was always returned.
    Return the actual result instead.

    Signed-off-by: Eric Engestrom
    Signed-off-by: Greg Kroah-Hartman

    Eric Engestrom
     
  • This is trivial to do:

    - add flags argument to simple_rename()
    - check if flags doesn't have any other than RENAME_NOREPLACE
    - assign simple_rename() to .rename2 instead of .rename

    Filesystems converted:

    hugetlbfs, ramfs, bpf.

    Debugfs uses simple_rename() to implement debugfs_rename(), which is for
    debugfs instances to rename files internally, not for userspace filesystem
    access. For this case pass zero flags to simple_rename().

    Signed-off-by: Miklos Szeredi
    Acked-by: Greg Kroah-Hartman
    Cc: Alexei Starovoitov

    Miklos Szeredi
     

21 Sep, 2016

1 commit

  • This patch introduces an accessor which can be used
    by the users of debugfs (drivers, fs, ...) to get the
    original file_operations struct. It also removes the
    REAL_FOPS_DEREF macro in file.c and converts the code
    to use the public version.

    Previously, REAL_FOPS_DEREF was only available within
    the file.c of debugfs. But having a public getter
    available for debugfs users is important as some
    drivers (carl9170 and b43) use the pointer of the
    original file_operations in conjunction with container_of()
    within their debugfs implementations.

    Reviewed-by: Nicolai Stange
    Signed-off-by: Christian Lamparter
    Cc: stable # 4.7+
    Signed-off-by: Greg Kroah-Hartman

    Christian Lamparter
     

31 Aug, 2016

1 commit

  • debugfs_create_file_unsafe() is declared twice in exactly the same
    manner each: once in fs/debugfs/internal.h and once in
    include/linux/debugfs.h

    All files that include the former also include the latter and thus,
    the declaration in fs/debugfs/internal.h is superfluous.

    Remove it.

    Signed-off-by: Nicolai Stange
    Signed-off-by: Greg Kroah-Hartman

    Nicolai Stange
     

01 Jul, 2016

1 commit


15 Jun, 2016

2 commits

  • Debugfs' open_proxy_open(), the ->open() installed at all inodes created
    through debugfs_create_file_unsafe(),
    - grabs a reference to the original file_operations instance passed to
    debugfs_create_file_unsafe() via fops_get(),
    - installs it at the file's ->f_op by means of replace_fops()
    - and calls fops_put() on it.

    Since the semantics of replace_fops() are such that the reference's
    ownership is transferred, the subsequent fops_put() will result in a double
    release when the file is eventually closed.

    Currently, this is not an issue since fops_put() basically does a
    module_put() on the file_operations' ->owner only and there don't exist any
    modules calling debugfs_create_file_unsafe() yet. This is expected to
    change in the future though, c.f. commit c64688081490 ("debugfs: add
    support for self-protecting attribute file fops").

    Remove the call to fops_put() from open_proxy_open().

    Fixes: 9fd4dcece43a ("debugfs: prevent access to possibly dead
    file_operations at file open")
    Signed-off-by: Nicolai Stange
    Signed-off-by: Greg Kroah-Hartman

    Nicolai Stange
     
  • Debugfs' full_proxy_open(), the ->open() installed at all inodes created
    through debugfs_create_file(),
    - grabs a reference to the original struct file_operations instance passed
    to debugfs_create_file(),
    - dynamically allocates a proxy struct file_operations instance wrapping
    the original
    - and installs this at the file's ->f_op.

    Afterwards, it calls the original ->open() and passes its return value back
    to the VFS layer.

    Now, if that return value indicates failure, the VFS layer won't ever call
    ->release() and thus, neither the reference to the original file_operations
    nor the memory for the proxy file_operations will get released, i.e. both
    are leaked.

    Upon failure of the original fops' ->open(), undo the proxy installation.
    That is:
    - Set the struct file ->f_op to what it had been when full_proxy_open()
    was entered.
    - Drop the reference to the original file_operations.
    - Free the memory holding the proxy file_operations.

    Fixes: 49d200deaa68 ("debugfs: prevent access to removed files' private
    data")
    Signed-off-by: Nicolai Stange
    Signed-off-by: Greg Kroah-Hartman

    Nicolai Stange
     

30 May, 2016

1 commit


19 Apr, 2016

1 commit


13 Apr, 2016

8 commits

  • Starting with 4.1 the tracing subsystem has its own filesystem
    which is automounted in the tracing subdirectory of debugfs.
    Prior to this debugfs could be bind mounted in a cloned mount
    namespace, but if tracefs has been mounted under debugfs this
    now fails because there is a locked child mount. This creates
    a regression for container software which bind mounts debugfs
    to satisfy the assumption of some userspace software.

    In other pseudo filesystems such as proc and sysfs we're already
    creating mountpoints like this in such a way that no dirents can
    be created in the directories, allowing them to be exceptions to
    some MNT_LOCKED tests. In fact we're already do this for the
    tracefs mountpoint in sysfs.

    Do the same in debugfs_create_automount(), since the intention
    here is clearly to create a mountpoint. This fixes the regression,
    as locked child mounts on permanently empty directories do not
    cause a bind mount to fail.

    Cc: stable@vger.kernel.org # v4.1+
    Signed-off-by: Seth Forshee
    Acked-by: Serge Hallyn
    Signed-off-by: Greg Kroah-Hartman

    Seth Forshee
     
  • The struct file_operations u32_array_fops associated with files created
    through debugfs_create_u32_array() has been lifetime aware already:
    everything needed for subsequent operation is copied to a ->f_private
    buffer at file opening time in u32_array_open(). Now, ->open() is always
    protected against file removal issues by the debugfs core.

    There is no need for the debugfs core to wrap the u32_array_fops
    with a file lifetime managing proxy.

    Make debugfs_create_u32_array() create its files in non-proxying operation
    mode by means of debugfs_create_file_unsafe().

    Signed-off-by: Nicolai Stange
    Signed-off-by: Greg Kroah-Hartman

    Nicolai Stange
     
  • Currently, the struct file_operations fops_blob associated with files
    created through the debugfs_create_blob() helpers are not file
    lifetime aware.

    Thus, a lifetime managing proxy is created around fops_blob each time such
    a file is opened which is an unnecessary waste of resources.

    Implement file lifetime management for the fops_bool file_operations.
    Namely, make read_file_blob() safe gainst file removals by means of
    debugfs_use_file_start() and debugfs_use_file_finish().

    Make debugfs_create_blob() create its files in non-proxying operation mode
    by means of debugfs_create_file_unsafe().

    Signed-off-by: Nicolai Stange
    Signed-off-by: Greg Kroah-Hartman

    Nicolai Stange
     
  • Currently, the struct file_operations fops_bool associated with files
    created through the debugfs_create_bool() helpers are not file
    lifetime aware.

    Thus, a lifetime managing proxy is created around fops_bool each time such
    a file is opened which is an unnecessary waste of resources.

    Implement file lifetime management for the fops_bool file_operations.
    Namely, make debugfs_read_file_bool() and debugfs_write_file_bool() safe
    against file removals by means of debugfs_use_file_start() and
    debugfs_use_file_finish().

    Make debugfs_create_bool() create its files in non-proxying operation mode
    through debugfs_create_mode_unsafe().

    Finally, purge debugfs_create_mode() as debugfs_create_bool() had been its
    last user.

    Signed-off-by: Nicolai Stange
    Signed-off-by: Greg Kroah-Hartman

    Nicolai Stange
     
  • Currently, the struct file_operations associated with the integer attribute
    style files created through the debugfs_create_*() helpers are not file
    lifetime aware as they are defined by means of DEFINE_SIMPLE_ATTRIBUTE().

    Thus, a lifetime managing proxy is created around the original fops each
    time such a file is opened which is an unnecessary waste of resources.

    Migrate all usages of DEFINE_SIMPLE_ATTRIBUTE() within debugfs itself
    to DEFINE_DEBUGFS_ATTRIBUTE() in order to implement file lifetime managing
    within the struct file_operations thus defined.

    Introduce the debugfs_create_mode_unsafe() helper, analogous to
    debugfs_create_mode(), but distinct in that it creates the files in
    non-proxying operation mode through debugfs_create_file_unsafe().

    Feed all struct file_operations migrated to DEFINE_DEBUGFS_ATTRIBUTE()
    into debugfs_create_mode_unsafe() instead of former debugfs_create_mode().

    Signed-off-by: Nicolai Stange
    Signed-off-by: Greg Kroah-Hartman

    Nicolai Stange
     
  • In order to protect them against file removal issues, debugfs_create_file()
    creates a lifetime managing proxy around each struct file_operations
    handed in.

    In cases where this struct file_operations is able to manage file lifetime
    by itself already, the proxy created by debugfs is a waste of resources.

    The most common class of struct file_operations given to debugfs are those
    defined by means of the DEFINE_SIMPLE_ATTRIBUTE() macro.

    Introduce a DEFINE_DEBUGFS_ATTRIBUTE() macro to allow any
    struct file_operations of this class to be easily made file lifetime aware
    and thus, to be operated unproxied.

    Specifically, introduce debugfs_attr_read() and debugfs_attr_write()
    which wrap simple_attr_read() and simple_attr_write() under the protection
    of a debugfs_use_file_start()/debugfs_use_file_finish() pair.

    Make DEFINE_DEBUGFS_ATTRIBUTE() set the defined struct file_operations'
    ->read() and ->write() members to these wrappers.

    Export debugfs_create_file_unsafe() in order to allow debugfs users to
    create their files in non-proxying operation mode.

    Signed-off-by: Nicolai Stange
    Signed-off-by: Greg Kroah-Hartman

    Nicolai Stange
     
  • Upon return of debugfs_remove()/debugfs_remove_recursive(), it might
    still be attempted to access associated private file data through
    previously opened struct file objects. If that data has been freed by
    the caller of debugfs_remove*() in the meanwhile, the reading/writing
    process would either encounter a fault or, if the memory address in
    question has been reassigned again, unrelated data structures could get
    overwritten.

    However, since debugfs files are seldomly removed, usually from module
    exit handlers only, the impact is very low.

    Currently, there are ~1000 call sites of debugfs_create_file() spread
    throughout the whole tree and touching all of those struct file_operations
    in order to make them file removal aware by means of checking the result of
    debugfs_use_file_start() from within their methods is unfeasible.

    Instead, wrap the struct file_operations by a lifetime managing proxy at
    file open:
    - In debugfs_create_file(), the original fops handed in has got stashed
    away in ->d_fsdata already.
    - In debugfs_create_file(), install a proxy file_operations factory,
    debugfs_full_proxy_file_operations, at ->i_fop.

    This proxy factory has got an ->open() method only. It carries out some
    lifetime checks and if successful, dynamically allocates and sets up a new
    struct file_operations proxy at ->f_op. Afterwards, it forwards to the
    ->open() of the original struct file_operations in ->d_fsdata, if any.

    The dynamically set up proxy at ->f_op has got a lifetime managing wrapper
    set for each of the methods defined in the original struct file_operations
    in ->d_fsdata.

    Its ->release()er frees the proxy again and forwards to the original
    ->release(), if any.

    In order not to mislead the VFS layer, it is strictly necessary to leave
    those fields blank in the proxy that have been NULL in the original
    struct file_operations also, i.e. aren't supported. This is why there is a
    need for dynamically allocated proxies. The choice made not to allocate a
    proxy instance for every dentry at file creation, but for every
    struct file object instantiated thereof is justified by the expected usage
    pattern of debugfs, namely that in general very few files get opened more
    than once at a time.

    The wrapper methods set in the struct file_operations implement lifetime
    managing by means of the SRCU protection facilities already in place for
    debugfs:
    They set up a SRCU read side critical section and check whether the dentry
    is still alive by means of debugfs_use_file_start(). If so, they forward
    the call to the original struct file_operation stored in ->d_fsdata, still
    under the protection of the SRCU read side critical section.
    This SRCU read side critical section prevents any pending debugfs_remove()
    and friends to return to their callers. Since a file's private data must
    only be freed after the return of debugfs_remove(), the ongoing proxied
    call is guarded against any file removal race.

    If, on the other hand, the initial call to debugfs_use_file_start() detects
    that the dentry is dead, the wrapper simply returns -EIO and does not
    forward the call. Note that the ->poll() wrapper is special in that its
    signature does not allow for the return of arbitrary -EXXX values and thus,
    POLLHUP is returned here.

    In order not to pollute debugfs with wrapper definitions that aren't ever
    needed, I chose not to define a wrapper for every struct file_operations
    method possible. Instead, a wrapper is defined only for the subset of
    methods which are actually set by any debugfs users.
    Currently, these are:

    ->llseek()
    ->read()
    ->write()
    ->unlocked_ioctl()
    ->poll()

    The ->release() wrapper is special in that it does not protect the original
    ->release() in any way from dead files in order not to leak resources.
    Thus, any ->release() handed to debugfs must implement file lifetime
    management manually, if needed.
    For only 33 out of a total of 434 releasers handed in to debugfs, it could
    not be verified immediately whether they access data structures that might
    have been freed upon a debugfs_remove() return in the meanwhile.

    Export debugfs_use_file_start() and debugfs_use_file_finish() in order to
    allow any ->release() to manually implement file lifetime management.

    For a set of common cases of struct file_operations implemented by the
    debugfs_core itself, future patches will incorporate file lifetime
    management directly within those in order to allow for their unproxied
    operation. Rename the original, non-proxying "debugfs_create_file()" to
    "debugfs_create_file_unsafe()" and keep it for future internal use by
    debugfs itself. Factor out code common to both into the new
    __debugfs_create_file().

    Signed-off-by: Nicolai Stange
    Signed-off-by: Greg Kroah-Hartman

    Nicolai Stange
     
  • Nothing prevents a dentry found by path lookup before a return of
    __debugfs_remove() to actually get opened after that return. Now, after
    the return of __debugfs_remove(), there are no guarantees whatsoever
    regarding the memory the corresponding inode's file_operations object
    had been kept in.

    Since __debugfs_remove() is seldomly invoked, usually from module exit
    handlers only, the race is hard to trigger and the impact is very low.

    A discussion of the problem outlined above as well as a suggested
    solution can be found in the (sub-)thread rooted at

    http://lkml.kernel.org/g/20130401203445.GA20862@ZenIV.linux.org.uk
    ("Yet another pipe related oops.")

    Basically, Greg KH suggests to introduce an intermediate fops and
    Al Viro points out that a pointer to the original ones may be stored in
    ->d_fsdata.

    Follow this line of reasoning:
    - Add SRCU as a reverse dependency of DEBUG_FS.
    - Introduce a srcu_struct object for the debugfs subsystem.
    - In debugfs_create_file(), store a pointer to the original
    file_operations object in ->d_fsdata.
    - Make debugfs_remove() and debugfs_remove_recursive() wait for a
    SRCU grace period after the dentry has been delete()'d and before they
    return to their callers.
    - Introduce an intermediate file_operations object named
    "debugfs_open_proxy_file_operations". It's ->open() functions checks,
    under the protection of a SRCU read lock, whether the dentry is still
    alive, i.e. has not been d_delete()'d and if so, tries to acquire a
    reference on the owning module.
    On success, it sets the file object's ->f_op to the original
    file_operations and forwards the ongoing open() call to the original
    ->open().
    - For clarity, rename the former debugfs_file_operations to
    debugfs_noop_file_operations -- they are in no way canonical.

    The choice of SRCU over "normal" RCU is justified by the fact, that the
    former may also be used to protect ->i_private data from going away
    during the execution of a file's readers and writers which may (and do)
    sleep.

    Finally, introduce the fs/debugfs/internal.h header containing some
    declarations internal to the debugfs implementation.

    Signed-off-by: Nicolai Stange
    Signed-off-by: Greg Kroah-Hartman

    Nicolai Stange
     

30 Mar, 2016

2 commits

  • CURRENT_TIME macro is not appropriate for filesystems as it
    doesn't use the right granularity for filesystem timestamps.
    Use current_fs_time() instead.

    Signed-off-by: Deepa Dinamani
    Signed-off-by: Greg Kroah-Hartman

    Deepa Dinamani
     
  • Directory inodes should start off with i_nlink == 2 (one extra ref
    for "." entry). debugfs_create_automount() increases neither the
    i_nlink reference for current inode nor for parent inode.

    On attempt to remove the automount dentry, kernel complains:

    [ 86.288070] WARNING: CPU: 1 PID: 3616 at fs/inode.c:273 drop_nlink+0x3e/0x50()
    [ 86.288461] Modules linked in: debugfs_example2(O-)
    [ 86.288745] CPU: 1 PID: 3616 Comm: rmmod Tainted: G O 4.4.0-rc3-next-20151207+ #135
    [ 86.289197] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150617_082717-anatol 04/01/2014
    [ 86.289696] ffffffff81be05c9 ffff8800b9e6fda0 ffffffff81352e2c 0000000000000000
    [ 86.290110] ffff8800b9e6fdd8 ffffffff81065142 ffff8801399175e8 ffff8800bb78b240
    [ 86.290507] ffff8801399175e8 ffff8800b73d7898 ffff8800b73d7840 ffff8800b9e6fde8
    [ 86.290933] Call Trace:
    [ 86.291080] [] dump_stack+0x4e/0x82
    [ 86.291340] [] warn_slowpath_common+0x82/0xc0
    [ 86.291640] [] warn_slowpath_null+0x1a/0x20
    [ 86.291932] [] drop_nlink+0x3e/0x50
    [ 86.292208] [] simple_unlink+0x4b/0x60
    [ 86.292481] [] simple_rmdir+0x37/0x50
    [ 86.292748] [] __debugfs_remove.part.16+0xa8/0xd0
    [ 86.293082] [] debugfs_remove_recursive+0xdb/0x1c0
    [ 86.293406] [] cleanup_module+0x2d/0x3b [debugfs_example2]
    [ 86.293762] [] SyS_delete_module+0x16b/0x220
    [ 86.294077] [] entry_SYSCALL_64_fastpath+0x12/0x6a
    [ 86.294405] ---[ end trace c9fc53353fe14a36 ]---
    [ 86.294639] ------------[ cut here ]------------

    To reproduce the issue it is enough to invoke these lines:

    autom = debugfs_create_automount("automount", NULL, vfsmount_cb, data);
    BUG_ON(IS_ERR_OR_NULL(autom));
    debugfs_remove(autom);

    The issue is fixed by increasing inode i_nlink references for current
    and parent inodes.

    Signed-off-by: Roman Pen
    Signed-off-by: Greg Kroah-Hartman

    Roman Pen
     

23 Jan, 2016

1 commit

  • parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
    inode_foo(inode) being mutex_foo(&inode->i_mutex).

    Please, use those for access to ->i_mutex; over the coming cycle
    ->i_mutex will become rwsem, with ->lookup() done with it held
    only shared.

    Signed-off-by: Al Viro

    Al Viro
     

11 Nov, 2015

1 commit

  • In debugfs' start_creating(), we pin the file system to safely access
    its root. When we failed to create a file, we unpin the file system via
    failed_creating() to release the mount count and eventually the reference
    of the vfsmount.

    However, when we run into an error during lookup_one_len() when still
    in start_creating(), we only release the parent's mutex but not so the
    reference on the mount. Looks like it was done in the past, but after
    splitting portions of __create_file() into start_creating() and
    end_creating() via 190afd81e4a5 ("debugfs: split the beginning and the
    end of __create_file() off"), this seemed missed. Noticed during code
    review.

    Fixes: 190afd81e4a5 ("debugfs: split the beginning and the end of __create_file() off")
    Cc: stable@vger.kernel.org # v4.0+
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Al Viro

    Daniel Borkmann
     

19 Oct, 2015

1 commit


18 Oct, 2015

4 commits

  • There aren't any read-only or write-only bool file ops, but there
    is a caller of debugfs_create_bool() that calls it with mode
    equal to 0400. This leads to the possibility of userspace
    modifying the file, so let's use the newly created
    debugfs_create_mode() helper here to fix this.

    Signed-off-by: Stephen Boyd
    Reviewed-by: Viresh Kumar
    Signed-off-by: Greg Kroah-Hartman

    Stephen Boyd
     
  • There aren't any read-only or write-only size_t file ops, but there
    is a caller of debugfs_create_size_t() that calls it with mode
    equal to 0400. This leads to the possibility of userspace
    modifying the file, so let's use the newly created
    debugfs_create_mode() helper here to fix this.

    Signed-off-by: Stephen Boyd
    Reviewed-by: Viresh Kumar
    Signed-off-by: Greg Kroah-Hartman

    Stephen Boyd
     
  • There aren't any read-only or write-only x64 file ops, but there
    is a caller of debugfs_create_x64() that calls it with mode equal
    to S_IRUGO. This leads to the possibility of userspace modifying
    the file, so let's use the newly created debugfs_create_mode()
    helper here to fix this.

    Signed-off-by: Stephen Boyd
    Reviewed-by: Viresh Kumar
    Signed-off-by: Greg Kroah-Hartman

    Stephen Boyd
     
  • The code that creates debugfs file with different file ops based
    on the file mode is duplicated in each debugfs_create_*() API.
    Consolidate that code into debugfs_create_mode(), that takes
    three file ops structures so that we don't have to keep
    copy/pasting that logic.

    Signed-off-by: Stephen Boyd
    Reviewed-by: Viresh Kumar
    Signed-off-by: Greg Kroah-Hartman

    Stephen Boyd
     

04 Oct, 2015

2 commits

  • According to commit a59d6293e537 ("debugfs: change parameter check in
    debugfs_remove() functions"), this is meant to make cleanup easier for
    callers. In that case it ought to be documented.

    Signed-off-by: Ulf Magnusson
    Signed-off-by: Greg Kroah-Hartman

    Ulf Magnusson
     
  • Its a bit odd that debugfs_create_bool() takes 'u32 *' as an argument,
    when all it needs is a boolean pointer.

    It would be better to update this API to make it accept 'bool *'
    instead, as that will make it more consistent and often more convenient.
    Over that bool takes just a byte.

    That required updates to all user sites as well, in the same commit
    updating the API. regmap core was also using
    debugfs_{read|write}_file_bool(), directly and variable types were
    updated for that to be bool as well.

    Signed-off-by: Viresh Kumar
    Acked-by: Mark Brown
    Acked-by: Charles Keepax
    Signed-off-by: Greg Kroah-Hartman

    Viresh Kumar
     

21 Jul, 2015

1 commit

  • The file read/write functions for bools have no special dependencies
    on debugfs internals and are sufficiently non-trivial to be worth
    exporting so clients can re-use the implementation.

    Signed-off-by: Richard Fitzgerald
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: Mark Brown

    Richard Fitzgerald
     

05 Jul, 2015

1 commit

  • Pull more vfs updates from Al Viro:
    "Assorted VFS fixes and related cleanups (IMO the most interesting in
    that part are f_path-related things and Eric's descriptor-related
    stuff). UFS regression fixes (it got broken last cycle). 9P fixes.
    fs-cache series, DAX patches, Jan's file_remove_suid() work"

    [ I'd say this is much more than "fixes and related cleanups". The
    file_table locking rule change by Eric Dumazet is a rather big and
    fundamental update even if the patch isn't huge. - Linus ]

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
    9p: cope with bogus responses from server in p9_client_{read,write}
    p9_client_write(): avoid double p9_free_req()
    9p: forgetting to cancel request on interrupted zero-copy RPC
    dax: bdev_direct_access() may sleep
    block: Add support for DAX reads/writes to block devices
    dax: Use copy_from_iter_nocache
    dax: Add block size note to documentation
    fs/file.c: __fget() and dup2() atomicity rules
    fs/file.c: don't acquire files->file_lock in fd_install()
    fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
    vfs: avoid creation of inode number 0 in get_next_ino
    namei: make set_root_rcu() return void
    make simple_positive() public
    ufs: use dir_pages instead of ufs_dir_pages()
    pagemap.h: move dir_pages() over there
    remove the pointless include of lglock.h
    fs: cleanup slight list_entry abuse
    xfs: Correctly lock inode when removing suid and file capabilities
    fs: Call security_ops->inode_killpriv on truncate
    fs: Provide function telling whether file_remove_privs() will do anything
    ...

    Linus Torvalds
     

04 Jul, 2015

1 commit

  • Pull user namespace updates from Eric Biederman:
    "Long ago and far away when user namespaces where young it was realized
    that allowing fresh mounts of proc and sysfs with only user namespace
    permissions could violate the basic rule that only root gets to decide
    if proc or sysfs should be mounted at all.

    Some hacks were put in place to reduce the worst of the damage could
    be done, and the common sense rule was adopted that fresh mounts of
    proc and sysfs should allow no more than bind mounts of proc and
    sysfs. Unfortunately that rule has not been fully enforced.

    There are two kinds of gaps in that enforcement. Only filesystems
    mounted on empty directories of proc and sysfs should be ignored but
    the test for empty directories was insufficient. So in my tree
    directories on proc, sysctl and sysfs that will always be empty are
    created specially. Every other technique is imperfect as an ordinary
    directory can have entries added even after a readdir returns and
    shows that the directory is empty. Special creation of directories
    for mount points makes the code in the kernel a smidge clearer about
    it's purpose. I asked container developers from the various container
    projects to help test this and no holes were found in the set of mount
    points on proc and sysfs that are created specially.

    This set of changes also starts enforcing the mount flags of fresh
    mounts of proc and sysfs are consistent with the existing mount of
    proc and sysfs. I expected this to be the boring part of the work but
    unfortunately unprivileged userspace winds up mounting fresh copies of
    proc and sysfs with noexec and nosuid clear when root set those flags
    on the previous mount of proc and sysfs. So for now only the atime,
    read-only and nodev attributes which userspace happens to keep
    consistent are enforced. Dealing with the noexec and nosuid
    attributes remains for another time.

    This set of changes also addresses an issue with how open file
    descriptors from /proc//ns/* are displayed. Recently readlink of
    /proc//fd has been triggering a WARN_ON that has not been
    meaningful since it was added (as all of the code in the kernel was
    converted) and is not now actively wrong.

    There is also a short list of issues that have not been fixed yet that
    I will mention briefly.

    It is possible to rename a directory from below to above a bind mount.
    At which point any directory pointers below the renamed directory can
    be walked up to the root directory of the filesystem. With user
    namespaces enabled a bind mount of the bind mount can be created
    allowing the user to pick a directory whose children they can rename
    to outside of the bind mount. This is challenging to fix and doubly
    so because all obvious solutions must touch code that is in the
    performance part of pathname resolution.

    As mentioned above there is also a question of how to ensure that
    developers by accident or with purpose do not introduce exectuable
    files on sysfs and proc and in doing so introduce security regressions
    in the current userspace that will not be immediately obvious and as
    such are likely to require breaking userspace in painful ways once
    they are recognized"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    vfs: Remove incorrect debugging WARN in prepend_path
    mnt: Update fs_fully_visible to test for permanently empty directories
    sysfs: Create mountpoints with sysfs_create_mount_point
    sysfs: Add support for permanently empty directories to serve as mount points.
    kernfs: Add support for always empty directories.
    proc: Allow creating permanently empty directories that serve as mount points
    sysctl: Allow creating permanently empty directories that serve as mountpoints.
    fs: Add helper functions for permanently empty directories.
    vfs: Ignore unlocked mounts in fs_fully_visible
    mnt: Modify fs_fully_visible to deal with locked ro nodev and atime
    mnt: Refactor the logic for mounting sysfs and proc in a user namespace

    Linus Torvalds
     

01 Jul, 2015

1 commit

  • This allows for better documentation in the code and
    it allows for a simpler and fully correct version of
    fs_fully_visible to be written.

    The mount points converted and their filesystems are:
    /sys/hypervisor/s390/ s390_hypfs
    /sys/kernel/config/ configfs
    /sys/kernel/debug/ debugfs
    /sys/firmware/efi/efivars/ efivarfs
    /sys/fs/fuse/connections/ fusectl
    /sys/fs/pstore/ pstore
    /sys/kernel/tracing/ tracefs
    /sys/fs/cgroup/ cgroup
    /sys/kernel/security/ securityfs
    /sys/fs/selinux/ selinuxfs
    /sys/fs/smackfs/ smackfs

    Cc: stable@vger.kernel.org
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

24 Jun, 2015

1 commit


11 May, 2015

1 commit


27 Apr, 2015

1 commit

  • Pull fourth vfs update from Al Viro:
    "d_inode() annotations from David Howells (sat in for-next since before
    the beginning of merge window) + four assorted fixes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    RCU pathwalk breakage when running into a symlink overmounting something
    fix I_DIO_WAKEUP definition
    direct-io: only inc/dec inode->i_dio_count for file systems
    fs/9p: fix readdir()
    VFS: assorted d_backing_inode() annotations
    VFS: fs/inode.c helpers: d_inode() annotations
    VFS: fs/cachefiles: d_backing_inode() annotations
    VFS: fs library helpers: d_inode() annotations
    VFS: assorted weird filesystems: d_inode() annotations
    VFS: normal filesystems (and lustre): d_inode() annotations
    VFS: security/: d_inode() annotations
    VFS: security/: d_backing_inode() annotations
    VFS: net/: d_inode() annotations
    VFS: net/unix: d_backing_inode() annotations
    VFS: kernel/: d_inode() annotations
    VFS: audit: d_backing_inode() annotations
    VFS: Fix up some ->d_inode accesses in the chelsio driver
    VFS: Cachefiles should perform fs modifications on the top layer only
    VFS: AF_UNIX sockets should call mknod on the top layer only

    Linus Torvalds
     

17 Apr, 2015

1 commit

  • Pull third hunk of vfs changes from Al Viro:
    "This contains the ->direct_IO() changes from Omar + saner
    generic_write_checks() + dealing with fcntl()/{read,write}() races
    (mirroring O_APPEND/O_DIRECT into iocb->ki_flags and instead of
    repeatedly looking at ->f_flags, which can be changed by fcntl(2),
    check ->ki_flags - which cannot) + infrastructure bits for dhowells'
    d_inode annotations + Christophs switch of /dev/loop to
    vfs_iter_write()"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (30 commits)
    block: loop: switch to VFS ITER_BVEC
    configfs: Fix inconsistent use of file_inode() vs file->f_path.dentry->d_inode
    VFS: Make pathwalk use d_is_reg() rather than S_ISREG()
    VFS: Fix up debugfs to use d_is_dir() in place of S_ISDIR()
    VFS: Combine inode checks with d_is_negative() and d_is_positive() in pathwalk
    NFS: Don't use d_inode as a variable name
    VFS: Impose ordering on accesses of d_inode and d_flags
    VFS: Add owner-filesystem positive/negative dentry checks
    nfs: generic_write_checks() shouldn't be done on swapout...
    ocfs2: use __generic_file_write_iter()
    mirror O_APPEND and O_DIRECT into iocb->ki_flags
    switch generic_write_checks() to iocb and iter
    ocfs2: move generic_write_checks() before the alignment checks
    ocfs2_file_write_iter: stop messing with ppos
    udf_file_write_iter: reorder and simplify
    fuse: ->direct_IO() doesn't need generic_write_checks()
    ext4_file_write_iter: move generic_write_checks() up
    xfs_file_aio_write_checks: switch to iocb/iov_iter
    generic_write_checks(): drop isblk argument
    blkdev_write_iter: expand generic_file_checks() call in there
    ...

    Linus Torvalds
     

16 Apr, 2015

1 commit